Intro to MILP

Mixed Integer Linear Programming (MILP) is a powerful optimization framework used to solve problems that can be modeled with linear relationships and decision variables, some of which are restricted to be integers (often binary). For us, it'll be useful in automating the search for optimal differential and linear characteristics in block ciphers.

At its core, a MILP problem consists of:

A linear objective function to minimize or maximize
A set of linear constraints over variables
A specification that some variables must be integers, often ${0,1}$

A formal formulation of a MILP problem looks like so: $$ \begin{align*} \text{Minimize:} &\quad c_1x_1 + c_2x_2 + \cdots + c_nx_n \ \text{Subject to:} &\quad A \cdot \bar{x} \leq \bar{b} \ &\quad x_i \in \mathbb{Z} \quad \text{(for some variables)} \ &\quad x_j \in \mathbb{R} \quad \text{(for others)} \end{align*} $$

Knapsacks

To show the power of MILP, but also provide an example, we'll solve a small Knapsack problem using it.

Suppose we are packing a small bag. There are 5 items to choose from, each with a weight and a cost. The bag can carry at most 10 kg, but we'd like to bring as much value as we can in those 10 kgs.

Item #	Weight (kg)	Value ($)
0	2	3
1	3	4
2	4	6
3	5	7
4	9	12

Classroom solutions to this include brute search, dynamic programming, and greedy optimisations. We're, however, interested in modeling this problem using MILP. The first step is usually figuring out what condition we want to optimise, and then building constraints around the goal. In our case, we want to maximise the value while keeping in mind the weight limit.

Mathematically, the knapsack problem¹ is formulated as just so: $$ \text{maximise } \sum^n_{i=1} v_i x_i \ \text{such that } \sum^n_{i=1} w_i x_i < W $$

where $x_i$ represent our (binary) choice in items, the $w_i$ the items' weights, $v_i$ their values, and $W$ as the carrying capacity of our knapsack.

Modeling

PuLP is a Python library for modeling and solving linear and integer programming problems. It offers:

A high-level API for building MILPs
Integration with multiple solvers (CBC, Gurobi, CPLEX, etc.)
Easy export to .lp files for inspection or external solving

It's easy to install, and is also what we'll be using to model our problem. As far as the choice of optimiser is concerned, we'll be using the open-source CBC solver, but you can use whichever else you may prefer. Note that PuLP does not come with a prepackaged solver, and you will need to install one.

Variables

At its most basic level, MILP needs variables. These are the data points in our problem, and all that which we can change and control. In case of the knapsack, our variables would be our items, and naming them tells PuLP what to refer to them as.

import pulp

item_0 = pulp.LpVariable("item_0")

By default, PuLP treats all of our variables as continuous values, meaning they can take on any real value. If we want it to be an integer, we need to specify it, like so:

item_0 = pulp.LpVariable("item_0", cat="Integer")

We'd further like that the items are limited to be either 0 or 1. To do so, we need to add a lower and upper bound to our variable.

item_0 = pulp.LpVariable("item_0", cat="Integer", lowBound=0, upBound=1)

Alternatively, PuLP has a shorthand for the above:

item_0 = pulp.LpVariable("item_0", cat="Binary")

Constraints

Constraints are important. They tell the model what isn't allowed, and what is necessary. The obvious constraint for our model is the weight: the sum of the weights of our chosen items mustn't exceed the knapsack's capacity. We can do this as follows:

W = 10
weights = [2,3,4,5,9]
items = [pulp.LpVariable(f"item_{i}", cat="Binary") for i in range(5)]

weight_sum = 0
for i in range(5):
    w_i = weights[i]
    x_i = items[i]
    weight_sum += w_i * x_i

constraint = weight_sum <= 10

This can be written more succinctly, either by using Python's sum function or pulp.lpSum which performs similarly. Either way, we now have a constraint.

The above is just a basic constraint. Some problems impose more complicated ones. For instance, we may be interested in excluding one item if another is selected. Say, items 2 and 4 may be mutually exclusive as options. This can be modeled as follows:

xor_constraint = items[2] + items[4] <= 1

The above only works if both are binary variables. As a brain teaser, think how you would accomplish this were they bounded integers instead.

Model

At the core, though, all of the above is meaningless without a model to interpret it. To do that though, we need to define our model, which means we need to define our core problem. Recall we're interested in maximising a value (the value, in our case) so we want a maximising model.

model = pulp.LpProblem("Knapsack Example", pulp.LpMaximize)

Constraints are added how you'd expect.

model += constraint

Most importantly, we need to add the function we're maximising.

values = [3, 4, 6, 7, 12]

value_sum = sum(items[i] * values[i] for i in range(5))
model += value_sum

Solving the model prints a lot of information. Usually, the value we're most interested in is the objective value of the model, that being the value of the function we maximised. We are also often interested in the value of our variables, so as to e.g. know which items were selected by the knapsack.

model.solve()

print(pulp.value(model.objective))
for v in model.variables():
    print(f"{v.name=}")

By default, PuLP tries to find and use the CBC optimiser. If you're using a different one, you'll have to specify it and pass it as an argument when calling the solve() function.

Try changing the values, and see how the items may change. Also try adding new constraints: perhaps instead of binary values, they're bounded integers, or maybe certain items can't be combined with other items. Getting into the mindset for MILP is a matter of practice. Once you feel you've the hang of it, you should be able to understand how we use it to build trails.

When we say the knapsack problem, what we're really referring to is the 0-1 knapsack problem. ↩