Part III: Testing and Timing (page 3) - Fun Destinations in US

Testing more complex code

On the prior pages, we tested (and timed) the reverseSort function to ensure its correctness. This is a rather simple function that just reverse sorts a list. But what about the overall BandB and other functions we created? Testing optimization code, which is effectively what BandB is, is not easy as we don’t usually have predetermined answers, nor do we know what the answer should be for random inputs. Also unit testing is should be setup for fast execution [1], but optimization processes are not quickly executing codes typically.

So what are the properties of BandB that we can test? Well upon conclusion, the function evaluator f of the presented solution should be the optimized value bestValue found. This test is expressed below as BandBMaxSatFinalSolutionTest. Another property is that any ‘shuffle’ of the input should not change the optimized value (the solution itself need not be unique, but the optimal value is). This is expressed as the test BandBMaxSatShuffleTest. Finally, we know that if the search tree estimator function g is better than any actual solution that the whole search tree will be queued and all variable combinations evaluated. This property is captured in the BandBMaxSatExhaustiveTest. And finally the test BandBMaxSatSolverTest compares the solution found with the normal g function and the one that forces all solutions to be tried. With this test passing, we are getting confident that the solver works (as long as the exhaustive search is correct – but we have confirmed that all search nodes are queued and all variable combinations evaluated — well at least that the correct number of those are tried).

Download this code

The code below is using the MaxSat input form to demonstrate testing, but the same arguments apply to all the problem domains. The BandB code was change slightly in that the number of nodes queued and the number of evaluations where added to the tuple output (and the printfn in BandB removed).

[<AutoOpen>]
module TestMaxSat =

    type Generators =
        static member genMaxSat =
            let log2 x = (log x) / (log 2.0)    // define log base 2
            // interpret size (sz) as the number of vars. returns int [] []
            Gen.sized <| fun sz -> 
                if sz >= 1 then
                    let varIndexes = [1..sz] @ [(-sz)..(-1)]    // possible var indices, for this size
                    let nClauses = sz * int (log2 (float sz))    // number of clauses = #vars log2 #vars
                    (Gen.elements varIndexes |> Gen.arrayOfLength 3) |> Gen.arrayOfLength nClauses 
                else
                    (Gen.constant 0 |> Gen.arrayOfLength 0) |> Gen.arrayOfLength 0  // no vars, no clauses

    [<Property>]
    let BandBMaxSatShuffleTest (PositiveInt nv) =
        let r = min nv 20   // no more than 20 variables in an instance
        let limitedInts = Gen.elements [1..r] |> Arb.fromGen

        Prop.forAll limitedInts (fun nVars ->
            (* Generate a list of clauses *)
            let clauses = (Generators.genMaxSat |> Gen.sample nVars 1).Head // take first and only
            
            (* Create a MaxSat instance for BandB *)
            let mutable (vars:MaxSatDiscreteVar []) = [| |]
            for i in 1..nVars do
                vars <- Array.append [| {Name = "v" + string(i); Setting = Unset} |] vars
            
            let f = clausefulleval clauses    // partial func application
            let g = clauseNumFalseFunc clauses
            let maxsatAnyUnset = Array.exists (fun (elem:MaxSatDiscreteVar) -> elem.Setting = ZeroOneVarSetting.Unset) 
            let (bestValue, solution, numqueues, numevals) = BandBmaximize f g maxsat_branch maxsatAnyUnset vars -1

            (* Shuffle the MaxSat instance, should get same soln value *)
            let clauses_s = clauses  // should be same soln
            shuffle clauses_s
            let mutable (vars_s:MaxSatDiscreteVar []) = [| |]
            for i in 1..nVars do
                vars_s <- Array.append [| {Name = "v" + string(i); Setting = Unset} |] vars_s
            
            let f_s = clausefulleval clauses_s    // partial func application
            let g_s = clauseNumFalseFunc clauses_s
            let (bestValue_s, solution_s, numqueues_s, numevals_s) = BandBmaximize f_s g_s maxsat_branch maxsatAnyUnset vars_s -1

            bestValue = bestValue_s // best sol'n value should be same, but solution itself might not be unique
            )

    [<Property>]
    let BandBMaxSatFinalSolutionTest (PositiveInt nv) =
        let limitedInts = Gen.elements [1..20] |> Arb.fromGen

        Prop.forAll limitedInts (fun nVars ->
            (* Generate a list of clauses *)
            let clauses = (Generators.genMaxSat |> Gen.sample nVars 1).Head // take first and only
            
            (* Create a MaxSat instance for BandB *)
            let mutable (vars:MaxSatDiscreteVar []) = [| |]
            for i in 1..nVars do
                vars <- Array.append [| {Name = "v" + string(i); Setting = Unset} |] vars
            
            let f = clausefulleval clauses    // partial func application
            let g = clauseNumFalseFunc clauses
            let maxsatAnyUnset = Array.exists (fun (elem:MaxSatDiscreteVar) -> elem.Setting = ZeroOneVarSetting.Unset) 
            let (bestValue, solution, numqueues, numevals) = BandBmaximize f g maxsat_branch maxsatAnyUnset vars -1

            bestValue = f solution // best value should be `f` of the solution
            )

    [<Property>]
    let BandBMaxSatExhaustiveTest (PositiveInt nv) =
        let r = min nv 10   // no more than 10 variables in an instance
        let limitedInts = Gen.elements [1..r] |> Arb.fromGen

        Prop.forAll limitedInts (fun nVars ->
            (* Generate a list of clauses *)
            let clauses = (Generators.genMaxSat |> Gen.sample nVars 1).Head // take first and only
            
            (* Create a MaxSat instance for BandB *)
            let mutable (vars:MaxSatDiscreteVar []) = [| |]
            for i in 1..nVars do
                vars <- Array.append [| {Name = "v" + string(i); Setting = Unset} |] vars
            
            let f = clausefulleval clauses    // partial func application
            let g (x:MaxSatDiscreteVar array) = clauses.Length + 1  // better than best possible!
            let maxsatAnyUnset = Array.exists (fun (elem:MaxSatDiscreteVar) -> elem.Setting = ZeroOneVarSetting.Unset) 
            let (bestValue, solution, numqueues, numevals) = BandBmaximize f g maxsat_branch maxsatAnyUnset vars -1
            
            // with no `g` function, we should explore/queue all solutions and eval all combos!
            (numqueues = ((pown 2 (nVars+1)) - 1)) |@ (sprintf "NumQueues wrong %d <> should be %d" numqueues ((pown 2 (nVars+1)) - 1))  .&.
            (numevals  = (pown 2 nVars)) |@ (sprintf "NumEvals wrong %d <> should be %d" numevals (pown 2 nVars))               
            )

    [<Property>]
    let BandBMaxSatSolverTest (PositiveInt nv) =
        let r = min nv 10   // no more than 10 variables in an instance
        let limitedInts = Gen.elements [1..r] |> Arb.fromGen

        Prop.forAll limitedInts (fun nVars ->
            (* Generate a list of clauses *)
            let clauses = (Generators.genMaxSat |> Gen.sample nVars 1).Head // take first and only
            
            (* Create a MaxSat instance for BandB *)
            let mutable (vars:MaxSatDiscreteVar []) = [| |]
            for i in 1..nVars do
                vars <- Array.append [| {Name = "v" + string(i); Setting = Unset} |] vars
            
            let f = clausefulleval clauses    // partial func application
            let g = clauseNumFalseFunc clauses
            let gAll (x:MaxSatDiscreteVar array) = clauses.Length + 1  // better than best possible!
            let maxsatAnyUnset = Array.exists (fun (elem:MaxSatDiscreteVar) -> elem.Setting = ZeroOneVarSetting.Unset) 
            let (bestValueSolved, _, _, _) = BandBmaximize f g maxsat_branch maxsatAnyUnset vars -1
            let (bestValueAll, _, _, _) = BandBmaximize f gAll maxsat_branch maxsatAnyUnset vars -1
            
            // with no `g` function, we should explore/queue all solutions and eval all combos!
            (bestValueSolved = bestValueAll) |@ (sprintf "Solution wrong %d <> should be %d" bestValueSolved bestValueAll)
            )

One of the requirements for such testing is the ability to create random inputs, based on a ‘size’ parameter. For MaxSat, the primary input is the clauses, which are arrays of arrays of ints, such that each int is a 1-based index into the variable array in positive or negative form (recall). The function that generates such inputs is on lines 5-14 named genMaxSat; the return value of this function is Gen<int [] []>. Gen is part of the FsCheck package that is geared towards generating examples. The argument to the Gen.sized function is a single argument function (which is named sz here, line 7) that forms the input for size specified by Gen.sized. In our case, we generate a list composed of the positive and negative integers in the range 1 to sz as the list varIndexes. I somewhat arbitrarily decided that for sz variables that I would create sz log2 sz clauses (sz² felt too large and sz too small). The rest of the code generates nClauses randomized 3-variable clauses. The actual test code that invokes this generator doesn’t call it with a size of 0, but to be safe I included that as a special case.

Don’t be thrown by the reverse pipe <| operator. For a function f, all of these are the same f x, x |> f and f <| x

To get an idea of what the generated clauses look like, the following is included in the main code.

1: 
2: 
3: 
4:

        (* Run sample to see what they look like *)
        let samples = ( Generators.genMaxSat |> Gen.sample 4 10 )
        for sample in samples do
            printfn "SAMPLE: %A" sample

This generates 10 samples with size parameter of 4. Each sample should have 8 clauses, as 4 × (log₂ 4) = 8. The output from this is below, we note that the examples generated are not necessarily hard instances, and that the also may have a good deal of repeated literals in a single clause, but we want to test such cases too.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20:

SAMPLE: [|[|3; 3; 3|]; [|2; 2; 4|]; [|-1; 2; -3|]; [|-3; -1; 3|]; [|4; 2; 2|];
  [|1; 1; 4|]; [|-1; 3; 2|]; [|-2; 1; -2|]|]
SAMPLE: [|[|-1; -3; 4|]; [|3; 3; 3|]; [|2; 2; 4|]; [|-1; 2; -3|]; [|-3; -1; 3|];
  [|4; 2; 2|]; [|1; 1; 4|]; [|-1; 3; 2|]|]
SAMPLE: [|[|-3; 1; -3|]; [|-1; -3; 4|]; [|3; 3; 3|]; [|2; 2; 4|]; [|-1; 2; -3|];
  [|-3; -1; 3|]; [|4; 2; 2|]; [|1; 1; 4|]|]
SAMPLE: [|[|-4; 2; 2|]; [|-3; 1; -3|]; [|-1; -3; 4|]; [|3; 3; 3|]; [|2; 2; 4|];
  [|-1; 2; -3|]; [|-3; -1; 3|]; [|4; 2; 2|]|]
SAMPLE: [|[|-3; 1; 3|]; [|-4; 2; 2|]; [|-3; 1; -3|]; [|-1; -3; 4|]; [|3; 3; 3|];
  [|2; 2; 4|]; [|-1; 2; -3|]; [|-3; -1; 3|]|]
SAMPLE: [|[|1; -1; -2|]; [|-3; 1; 3|]; [|-4; 2; 2|]; [|-3; 1; -3|]; [|-1; -3; 4|];
  [|3; 3; 3|]; [|2; 2; 4|]; [|-1; 2; -3|]|]
SAMPLE: [|[|1; -4; -3|]; [|1; -1; -2|]; [|-3; 1; 3|]; [|-4; 2; 2|]; [|-3; 1; -3|];
  [|-1; -3; 4|]; [|3; 3; 3|]; [|2; 2; 4|]|]
SAMPLE: [|[|2; 3; -3|]; [|1; -4; -3|]; [|1; -1; -2|]; [|-3; 1; 3|]; [|-4; 2; 2|];
  [|-3; 1; -3|]; [|-1; -3; 4|]; [|3; 3; 3|]|]
SAMPLE: [|[|-2; -4; 1|]; [|2; 3; -3|]; [|1; -4; -3|]; [|1; -1; -2|]; [|-3; 1; 3|];
  [|-4; 2; 2|]; [|-3; 1; -3|]; [|-1; -3; 4|]|]
SAMPLE: [|[|2; 1; 1|]; [|-2; -4; 1|]; [|2; 3; -3|]; [|1; -4; -3|]; [|1; -1; -2|];
  [|-3; 1; 3|]; [|-4; 2; 2|]; [|-3; 1; -3|]|]

We have a choice in how we define the tests, we could define generators for all the arguments needed for BandB. But since the functions that need to be created (f and g) need to have consistent arguments this would be difficult. Instead I construct the needed inputs just based on a single parameter that I interpret as a size parameter. Let’s first look at the test BandBMaxSatShuffleTest on lines 17-47.

The property to be tested against is the PositiveInt nv which is my size parameter. Now a positive integer may range up to the maximum size of an int which is far, far too many variables for our MaxSat b-and-b solver. So we need to limit the actual size, using an approach from here we first generate elements from 1 to r where r is the minimum of nv and 20 and then use the Prop.forAll tester against the generated ints. The function argument nVars is then the actual size parameter for a test. We generate a clause instance using Gen.sample (which creates a array of clauses of length 1, we only want one so we take the Head). We then create the variable array and form the functions f and g for the BandB solver. The same process is then used on a shuffled version of the clauses, and finally the test determines if the solution values found are the same. Note that the solution itself solution and solution_s may be different as the solution need not be unique, but the optimal values must be.

The test BandBMaxSatExhaustiveTest on lines 71-92 follows a similar process, but adds a bit of output for failed tests. For the exhaustive test, the g function is set to a constant that is better than any possible solution, specifically to the value of 1 plus the number of clauses. This causes the branch and bound algorithm to explore the whole search tree and to evaluate every possible combination of variables. For this test, I limited the number of variables to 10 as even 20 would be very time consuming. The boolean parts in the final result are decorated/labelled with strings and logically and-ed with the FsCheck defined .&. operator. If this test failed (it doesn’t) then extra output would be added that explains the failure a bit.

Any suggestions for doing a test that determines if all combinations of the exhaustive test are completed?

The BandBMaxSatSolverTest (lines 94-117) is the one that is meant to be the one that really determines if the solver works. This compares the solver output optimized value with the output of the ‘exhaustive’ solver. We know, from test BandBMaxSatExhaustiveTest, that the exhaustive version is exploring the correct number of search nodes and evaluating the correct number of variable combinations (all of them), but not that these are necessarily covering all the variations — e.g. what if it is erroneously repeating a combination and missing another one? Since determining this seems inherently exponential in size and compute time, that is we would have to somehow instrument the code to keep track of an exponential number of combinations and then make sure that all cases are covered, this seems like a difficult test to conduct (in terms of time). Given the structure of the code and passing of the tests, I am confident in the code though. This sort of goes back to the idea that you can’t test everything [1]. In the figure below the ‘MaxSat’ tests start at the blue line, all of the tests from the prior code is included in this one too.

Note that for knapsack instances we could instead compare optimizer results with the exhaustive solver for knapsack we created instead of BandB with the fixed g function.

We pass all the testss

Since BandB is tested, we would next also test the particular functions for each problem instance as well. For example, how do we know that clausefulleval is really determining the correct number of clauses satisfied? How about TSPcost — is that working correctly for TSP? Rather than work on more unit testing, let’s move to another topic.

Integration Tests

There are many additional topics of interest in testing, and many approaches as well [1]. In Agile software development (and here) a testing component is usually integrated into the process, sometimes as a test-driven design (and here). From here, another topic is regression testing

Whenever developers change or modify their software, even a small tweak can have unexpected consequences. Regression testing is testing existing software applications to make sure that a change or addition hasn’t broken any existing functionality. Its purpose is to catch bugs that may have been accidentally introduced into a new build or release candidate, and to ensure that previously eradicated bugs continue to stay dead.

In a developer-focussed testing approach as we’ve done here, the next stage up from unit testing (be it standard or property-based) is integration testing (and then to system test). That is testing software components as they interact or make use of each other. Other than the interaction of our functions f, g and branch for each problem type, and the reverseSort used in BandB there isn’t much integration.

We have already tested, in part, MaxSat operating with BandB which is a mild form of integration testing but would still need to complete unit testing of each of these functions and then test everything. But we’ve demonstrated some appropriate testing procedures and this would be more of the same, so let’s move on to part 4.

Reference

[1] Glenford J. Myers, Corey Sandler, Tom Badgett, The Art of Software Testing, 3rd Ed, 2011, Wiley