COMP3821

Extended Algorithms and Programming Techniques

25 April 2021

The course is split into four topics

  1. Divide and Conquer
  2. Greedy Algorithms
  3. Dynamic Programming
  4. Linear Programming and Reductions

However for the sake of organising notes, I've included an extra section to cover knowledge not covered in the course's prerequisite (COMP2521), and split Linear Programming and Reductions into two sections.

Misc Knowledge


Asymptotic Runtime

$O(n)$, Big O

  • Denotes the upper bound of the runtime of an algorithm
  • If $f(n) = O(g(n))$, there exist positive constants $c$ and $n_0$ such that $0 \leq f(n) \leq cg(n), \forall n \geq n_0$
  • $f(n) = O(g(n))$ means that $f(n)$ does not grow substantially faster than $g(n)$ because a multiple of $g(n)$ eventually dominates $f(n)$
  • Most commonly used as we are concerned with the worst runtime

$\Theta(n)$, Big Theta

  • Denotes a tight bound of the runtime of an algorithm
  • $f(n) = \Theta(g(n))$ iff $f(n) = O(g(n))$ and $f(n) = \Omega(g(n))$
  • Less commonly used than $O(n)$, but more common than $\Omega(n)$

$\Omega(n)$, Big Omega

  • Denotes the lower bound of the runtime of an algorithm
  • If $f(n) = \Omega(g(n))$, there exist positive constants $c$ and $n_0$ such that $0 \leq cg(n) \leq f(n), \forall n \geq n_0$
  • $f(n) = \Omega(g(n))$ means that $f(n)$ grows at least as fast as $g(n)$, because $f(n)$ eventually dominates a multiple of $g(n)$
  • Least often used as we usually aren't concerned with the best runtime

Math

Log Identity

If $a, b, c > 0$ then $$a^{\log_{b}c} = c^{\log_{b}a}$$ Proof

$$ \begin{align*} \log_{b}c \cdot \log_{b}a &= log_{b}a \cdot \log_{b}c \\ \log_{b}(a^{\log_{b}c}) &= \log_{b}(c^{\log_{b}a}) \\ a^{log_{b}c} &= c^{\log_{b}a} \end{align*} $$

Roots of Unity

Representation

Complex numbers $z = a + ib$ can be represented using

  • $modulus$ $|z| = \sqrt{a^2 + b^2}$
  • $argument$ $\arg(z)$, which is an angle taking values in $(-\pi, \pi]$

and satisfying: $$z = |z|e^{i \arg(z)} = |z|(\cos(\arg(z)) + i \sin(\arg(z))).$$

and

$$z^n = \left(|z|e^{i \arg(z)}\right)^n = |z|^ne^{in\arg(z)} = |z|^n(n\cos(\arg(z)) + i \sin(n\arg(z))).$$

Properties

Roots of unity of order $n$ are complex numbers which satisfy $z^n = 1$.

  • If $z^n = |z|^n(\cos(n \arg(z)) + i\sin(n\arg(z))) = 1$ then $|z| = 1$ and $n\arg(z)$ is a multiple of $2\pi$
  • Thus, $n\arg( z) = 2\pi k$, i.e. $arg(z) = \frac{2\pi k}{n}$
  • We denote $\omega_n = e^{i \frac{2\pi}{n}}$, such that $\omega_n$ is called a primitive root of unity order $n$
  • A root of unity $\omega$ of order $n$ is primitive if all other roots of unity of the same order can be obtained as its powers $\omega^k$.

For $\omega_n = e^{i \frac{2\pi}{n}}$ and for all $k$ such that $0 \leq k \leq n - 1$, $$ (\omega_n^k)^n = ((\omega_n)^k)^n = (\omega_n)^{nk} = ((\omega_n)^n)^k = 1^k = 1.$$

If $k + m \geq n$ then $k + m = n + l$ for $l = (k + m) \mod(n)$ and we have $$\omega_n^k \omega_n^m = \omega_n^{k+m} = \omega_n^{n+l} = \omega_n^n \omega_n^l = 1 \cdot \omega_n^l = \omega_n^l \text{ where } 0 \leq l \leq n.$$

Therefore, the product of any two roots of unity of the same order is just another root of unity of the same order (this is not the case for addition, as the sum of two roots of unity is usually not another root of unity).

The Cancellation Lemma: $\omega_{kn}^{km} = \omega_n^m$, and its proof is below $$\omega_{kn}^{km} = (\omega_{kn})^{km} = \left(e^{i \frac{2\pi}{kn}}\right)^{km} = e^{i \frac{2\pi k m}{k n}} = e^{i \frac{2\pi m}{n}} = \left(e^{i \frac{2\pi}{n}}\right)^m = \omega_n^m.$$

Divide and Conquer


Divide and conquer algorithms recursively break down a problem into two or more non overlapping subproblems, before merging them together to become easy to solve.

Common examples of Divide and Conquer algorithms include

  • Binary Search
  • Merge Sort
  • Fast Fourier Transform

Sample Algorithms

Karatsuba's Fast Integer Multiplication

Naive method

Given 2 'big integers', i.e. a number, stored as an array, where the values at an the $i^{th}$ index is the $i^{th}$ digit of the integer, and there are $n$ digits, a sample solution would look like

# Naive method for big integer multiplication

def mul(A: list[int], B: list[int]) -> list[int]:
    n, m = len(A), len(B)
    C = [0] * (n + m)

    # Apply multiplication
    for i in range(n):
        for j in range(m):
            C[i + j] += A[i] * B[j]

    # Add carry amount to next digit and take mod of 10
    for i in range(n + m - 1):
        C[i + 1] += C[i] // 10
        C[i] %= 10

    return C

and hence have a runtime of $\Theta(n^{2})$, where $n$ is the greater number of digits.

Karatsuba's method

Say we want to find the solution to the multiplication of $$P(x) = a_1x + a_0,$$ $$Q(x) = b_1x + b_0,$$ a naive solution will require 4 multiplications

  1. $A = a_1 \times b_1$
  2. $B = a_1 \times b_0$
  3. $C = b_1 \times a_1$
  4. $D = b_0 \times a_0$

$$P(x)Q(x) = Ax^2 + (B + C)x + D$$ However we know that $$(a_1x + a_0)(b_1x + b_0) = (a_1b_1 + a_1b_0 + a_0b_1 + a_0b_0) = (A + B + C + D),$$ requiring 2 additions (which can be computed easily) and 1 multiplication. Hence to find the final multiplication, we only need three multiplications

  1. $E = (A + B + C + D) = (a_1 + a_0) \times (b_1 + b_0)$
  2. $A = a_1 \times b_1$
  3. $D = b_0 \times a_0$

Hence to find the final solution, we can do $$P(x)Q(x) = Ax^2 + (E - A - D)x + D$$

Karatsuba's method for integers

If we want to multiply $A$ with $B$, we can represent them as

$$ \begin{align*} A &= 10^{\frac{n}{2}}A_1 + A_0 \\ B &= 10^{\frac{n}{2}}B_1 + B_0. \end{align*} $$

With this, we can apply Karatsuba's method, where

$$ \begin{align*} x &= A_1B_1 \\ z &= A_0B_0 \\ y &= (A_1 + A_0)(B_1 + B_0) - x - z \\ AB &= 10^{n}x + 10^{\frac{n}{2}}y + z. \end{align*} $$

At each function call, there are 3 multiplications being performed (each of which is multiplied by the same algorithm), where the input size is halved in each recursive call, and the addition runs in $O(n)$. Therefore Karatsuba's method has the recurrence $$ T(n) = 3T\left(\frac{n}{2}\right) + O(n) = O(n^{\lg 3}). $$

Karatsuba's method for polynomials

We can view this representation of integers similar to that of polynomials. Say we are given polynomials $A(x)$, $B(x)$, where $$A(x) = a_nx^n + a_{n-1}x^{n-1} + \ldots + a_0$$ $$B(x) = b_nx^n + b_{n-1}x^{n-1} + \ldots + b_0.$$ We have $C(x) = A(x) \cdots B(x)$ of degree $2n$ $$C(x) = \sum_{j = 0}^{2n}c_jx^j = A(x)B(x) = \sum_{j=0}^{2n} \left( \sum_{i + k = j}a_ib_k \right)x^j$$ however, we want to find the coefficients of $c_j = \sum_{i+k=j}a_ib_k$ without performing $(n+1)^2$ many multiplications necessary to get all products of the form $a_ib_k$.

Fast Fourier Transform

A naive way to multiply two polynomials together would be to multiply each term of the first polynomial with a term of the second polynomial. The runtime of this would be $O(n^2)$, where $n$ is the degree/ number of terms of the polynomials.

Another method would be to convert both polynomial into it's point value form, multiply those points and convert it back into the polynomial form. The fast Fourier transform (FFT) is an algorithm to calculate the Discrete Fourier transform (DFT) of a sequence, or the inverse (IDFT) in $O(n \log n)$.

Polynomial Interpolation (Vandermonde Matrix)

From Coefficient to Value Representation

A polynomial $A(x)$ of degree $n$ is uniquely determined by its values at any $n + 1$ distinct input values $x_0, x_1, \ldots, x_n$: $$A(x) \leftrightarrow \langle(x_0, A(x_0)), (x_1, A(x_1)), \ldots, (x_n, A(x_n))\rangle$$

For $A(x) = a_nx^n + a_{n-1}x^{n-1} + \ldots + a_0$, these values can be obtained via a matrix multiplication:

$$ \begin{pmatrix} 1 & x_0 & x_0^2 & \ldots & x_0^n \\ 1 & x_1 & x_1^2 & \ldots & x_1^n \\ \vdots & \ldots & \ldots & \ddots & \vdots \\ 1 & x_n & x_n^2 & \ldots & x_n^n \\ \end{pmatrix} \begin{pmatrix} a_0 \\ a_1 \\ \vdots \\ a_n \\ \end{pmatrix} = \begin{pmatrix} A(x_0) \\ A(x_1) \\ \vdots \\ A(x_n) \\ \end{pmatrix} $$

Such a matrix is called the Vandermonde matrix.

From Value to Coefficient Representation

It can be shown that if $x_i$ are all distinct, then this matrix is invertible. Thus, if all $x_i$ are all distinct, given any values $A(x_0), A(x_1), \ldots, A(x_n)$ the coefficients $a_0, a_1, \ldots, a_n$ of the polynomial $A(x)$ are uniquely determined:

$$ \begin{pmatrix} 1 & x_0 & x_0^2 & \ldots & x_0^n \\ 1 & x_1 & x_1^2 & \ldots & x_1^n \\ \vdots & \ldots & \ldots & \ddots & \vdots \\ 1 & x_n & x_n^2 & \ldots & x_n^n \\ \end{pmatrix}^{-1} \begin{pmatrix} A(x_0) \\ A(x_1) \\ \vdots \\ A(x_n) \\ \end{pmatrix} = \begin{pmatrix} a_0 \\ a_1 \\ \vdots \\ a_n \\ \end{pmatrix} $$

Complexity of Swapping Representation

If we fix the inputs $x_0, x_1, \ldots, x_n$ then commuting between a representation of a polynomial $A(x)$ via its coefficients and a representation via its values at these points is done via the following two matrix multiplications, with matrices made from constants, and thus, for fixed inputs $x_0, x_1, \ldots, x_n$, this switch between the two kinds of representations is done in linear time.

Discrete Fourier Transform

For $\mathbf{a} = \langle a_0, a_1, \ldots a_{n-1} \rangle$ a sequence of $n$ real or complex numbers, we can form the corresponding polynomial $P_A(x) = \sum_{j=0}^{n-1}a_jx^j$, and evaluate it at all complex roots of unity of order n, $$\forall \ 0 \leq k \leq n - 1, \quad P_A(\omega_n^k) = A_k = \sum_{j=0}^{n-1}a_j\omega_n^{jk}.$$

The DFT of a sequence $\mathbf{a}$ is a sequence $\mathbf{A}$ of the same length.

Inverse Discrete Fourier Transform

The IDFT of a sequence $\mathbf{A} = \langle A_0, A_1, \ldots, A_{n-1} \rangle$ is the sequence of values $\mathbf{a} = \langle a_0, a_1, \ldots, a_{n-1} \rangle = \langle \frac{P_a(1)}{n}, \frac{P_a(\omega_n^{-1})}{n}, \frac{P_a(\omega_n^{-2})}{n}, \ldots, \frac{P_a(\omega_n^{1-n})}{n} \rangle$.

We can show that IDFT(DFT($\mathbf{a}$)) = $\mathbf{a}$ and DFT(IDFT($\mathbf{A}$)) = $\mathbf{A}$.

Computation of DFT/ IDFT

Brute force computation of the DFT takes $\Theta(n^2)$, same for IDFT. The DFT of a sequence can be computed in $\Theta(n \lg n)$ using the FFT (as can be the IDFT).

Proofs

Proof of Correctness

For Divide and Conquer proof of correctness, mathematical induction is used.

Say that we want to prove that an algorithm $A(n)$ is correct.

  1. Prove that the base case is correct, i.e. $A(0)$, $A(1)$ is correct (depends on your base case)
  2. Prove induction step
    1. Assume that $A(k)$ is true, where $k < n$ (Inductive hypothesis)
    2. Therefore, recursive calls are correct as they call $A(k)$ where $k < n$, which is true by the inductive hypothesis
    3. Prove that the conquer part of the algorithm is correct
  3. Hence algorithm is correct for input size $n$

Proof of Runtime (Master Theorem)

Representation

The time complexity of a divide and conquer algorithm can be represented in the form $$T(n) = aT \left( \frac{n}{b} \right) + f(n)$$ where,

  • $T(n)$
    • The divide and conquer algorithm, with an input size of $n$
  • $a \geq 1$
    • $a$ is the number of recursive calls per function
    • The constraint implies we make at least 1 recursive call, otherwise there would be no recursion
  • $b > 1$
    • $b$ is the decrease in input size per recursive call, the constraint implying that the input size must decrease in each recursive call
    • The constraint implies that the input size must decrease at each recursive call, otherwise we would loop infinitely, never reaching the base case
  • $f(n)$
    • The runtime within each function call

Runtime cases

Using this information, we can find the runtime of most DAQ algorithms through one of the 3 cases where if $\epsilon > 0$ is a constant, then

  1. If $f(n) = O(n^{\log_b a - \epsilon})$, then $T(n) = \Theta(n^{log_b a})$
  2. If $f(n) = \Theta(n^{\log_b a})$, then $T(n) = \Theta(n^{log_b a} \cdot \log n)$
  3. If $f(n) = \Omega(n^{\log_b a + \epsilon})$, then $T(n) = \Theta(f(n))$

Examples

Case 1

If $f(n) = O(n^{\log_b a - \epsilon})$, then $T(n) = \Theta(n^{log_b a})$
Work per subproblem is dwarfed by number of subproblems

  • Binary tree DFS | $T(n) = 2T \left( \frac{n}{2} \right) + c$
    • Proof:
      • $n^{log_b a} = n^{\lg 2} = n^{1} = n$
      • $f(n) = c = \Theta(1)$
    • Therefore $T(n) = \Theta(n^{\lg 2}) = \Theta(n)$
  • Karatsuba's integer multiplication | $T(n) = 3T \left( \frac{n}{2} \right) + n$
    • Proof
      • $n^{\log_2 3} = n^{\lg 3} \approx n^{1.58}$
      • $f(n) = n = \Theta(n)$
    • Therefore $T(n) = \Theta(n^{1.58})$
Case 2

If $f(n) = \Theta(n^{\log_b a})$, then $T(n) = \Theta(n^{log_b a} \cdot \log n)$
Work per subproblem is comparable to number of subproblems

  • Binary search | $T(n) = T \left( \frac{n}{2} \right) + c$
    • Proof:
      • $n^{\log_b a} = n^{\lg 1} = n^{0} = 1$
      • $f(n) = c = \Theta(1)$
    • Therefore $T(n) = \Theta(n^{\lg 1} \cdot \Theta(1)) = \Theta(\log n)$
  • Merge sort | $T(n) = 2T \left( \frac{n}{2} \right) + cn$
    • Proof:
      • $n^{\log_b a} = n^{\lg 2} = n^{1} = n$
      • $f(n) = cn = \Theta(n)$
    • Therefore $T(n) = \Theta(n^{\lg 2} \cdot \Theta(n)) = \Theta(n \log n)$
Case 3

If $f(n) = \Omega(n^{\log_b a + \epsilon})$, then $T(n) = \Theta(f(n))$
Work per subproblem dominates number of subproblems

Greedy Algorithms


Greedy algorithms are often used for optimisation problems (i.e. you want to maximise or minimise some quantity according to a set of constraints) and make the optimal choice at each step to find the overall optimal way to solve the entire problem.

Common examples of Greedy Algorithms include

  • Kruksal's algorithm/ Prim's algorithm
  • A* path finding algorithm
  • Huffman encoding

Sample Algorithms

Interval Scheduling

Given a set of $n$ tasks requests with start finish times,

$$R = \{(s_1, f_1), (s_2, f_2), ..., (s_n, f_n)\}$$

we want to find the maximum number of non overlapping tasks that we can complete.

An example of a greedy strategy that can solve this is Earliest Finish First

def eaf(tasks: list[tuple[int, int]]) -> list[tuple[int, int]]:
    # Sort by earliest finish time
    tasks.sort(key=lambda job: job[1])
    schedule = []
    prev_finish = 0

    for (start, finish) in tasks:
        # If the task doesn't conflict with the previous finish
        if start > prev_finish:
            # Add it to our schedule and update the finish time
            schedule.append((start, finish))
            prev_finish = finish

    return schedule

Runtime

The runtime of this algorithm is $O(n \log n)$ as it is dominated by the time required to sort the jobs, as the following loop runs in $O(n)$ time.

Proof of Correctness

First, we prove the result is correct (i.e. there are no tasks conflicting).

This is true as we only add a new task to the schedule if it does not conflict with previous tasks.

Proof of Optimality

Secondly, we prove that the result is optimal (i.e. it is indeed the maximum number of non conflicting tasks we can schedule).

Let $O = \{o_1, ..., o_k\}$ be the tasks of an optimal solution listed in increasing order of finish time, and $G = \{g_1,..., g_k \}$ by the tasks of the EFF solution.

Since $O$ is optimal, it must contain at least as many tasks as $G$, hence there must be the first index $j$, where these two schedules differ, such that

$$ \begin{aligned} O &= \{o_1, ..., o_{j - 1}, o_j, ... \} \\ G &= \{o_1, ..., o_{j - 1}, g_j, ... \}. \end{aligned} $$

Since EFF is correct and selects the earliest finish activity time, $g_j$ does not conflict with any earlier activity, and it finishes no later than $o_j$.

Now consider a "greedier" optimal solution $O'$ where we replace $o_j$ with $g_j$ in $O$, such that

$$O' = \{ o_1, ..., o_{j - 1}, g_j, o_{j + 1}, ... \}.$$

Clearly, $O'$ is correct since $g_j$ finishes no later than $x_j$ so therefore there are no conflicts. Therefore, this new schedule has the same number of activities as $O$, and so it is at least as good.

By repeating this process, we will eventually convert $O$ into $G$, without decreasing the number of tasks. Therefore $G$ is optimal.

Minimum Spanning Tree (MST)

Let $G = (V, E)$ be a connected undirected graph.

A Spanning Tree is a subgraph $T = (V, E_T)$ of $G$ such that

  • T does not contain any cycles
  • T is connected

If $G$ is an edge weighted graph, then a MST is a spanning tree of minimum weight.

Kruskal's Algorithm

  1. Order the edges $E$ in a non-decreasing order by their weight
  2. Build a tree by adding the lowest weight edge each time
  3. An edge $E_i$ is added at a round $i$ of construction if it does not introduce a cycle
  4. If adding an edge would introduce a cycle, that edge is discarded
  5. This continues until the list of all edges has been exhausted
Proof of Correctness

Let $T$ be the output of the algorithm

  • $T$ does not contain any cycle

We show by contradiction that $T$ is connected:

  • Assume there are two or more connected components $C_1$ and $C_2$
  • $G$ is connected, so there are some edges connecting $C_1$ to $C_2$ in $G$
  • The first of such edges would have been added to $T$ because it would not create any cycle in $T$.

Therefore $T$ is a spanning tree

Proof of Optimality

We consider the case where all weights are distinct.

Let $T$ be the output of Kruskal's algorithm.

  • Consider a spanning tree $T'$ distinct from $T$.
  • Let $e = \{u, v \}$ be the smallest-weight edge in $T$ that is not in $T'$.
  • $T'$ is spanning so there exists a path $P$ from $u$ to $v$.
  • Let $T'' = (V, \{e\} \cup E_{T'} \backslash \{f\})$, which is spanning tree.
  • $w(e) < w(f)$ because otherwise Kruskal's algorithm would have added $f$ to $T$ instead of $e$.
  • Furthermore, $T''$ weighs less than $T'$, so $T'$ is not an MST.

$G$ has an MST and any $T' \neq T$ is not an MST, so $T$ is an MST.

Kruskal's vs Prim's algorithm

Prim's algorithm is another greedy algorithm to find the MST of a graph, however it's differences are listed as below.

Kruskal's algorithm Prim's algorithm
Tree built always remains connected Tree build usually remains disconnected
Adds next cheapest edge Adds next cheapest vertex
Faster for sparse graphs Faster for dense graphs

Proofs

Whilst greedy algorithms typically have a structure for correctness and optimality, the proof of runtime varies a lot depending on the data structures and algorithms used.

Proof of Correctness and Optimality

Note that greedy algorithms are typically used to maximise or minimise a quantity under a constraint.

We typically prove for correctness - that it satisfies the constraints of the problem (i.e. in task scheduling, we prove no two tasks overlap).

Secondly prove that the greedy method is achieves the most optimal solution (i.e. it does maximise or minimises a quantity). This can be done multiple ways

  1. Greedy stays ahead: This works by showing that according to the measure of optimality, the greedy algorithm is at least as far ahead as the optimal solution during each iteration of the algorithm. It is typically done in 4 steps.

    1. Define your solution Introduce variables to denote the greedy solution $G$ and the optimal solution $O$ (i.e. for task scheduling, it is the set of tasks).
    2. Define your measure Define a measure of optimality (i.e. for tasks scheduling, it is the number of jobs).
    3. Prove greedy stays ahead Prove that the measure of optimality for the greedy solution is at least as good as the optimal.
    4. Prove optimality Because the greedy solution stays ahead, it must produce an optimal solution. This is typically done by contradiction by assuming the greedy solution isn't optimal and using the fact that greedy stays ahead as a contradiction.
  2. Exchange arguments: This works by showing you can iteratively transform any optimal solution into the result of the greedy algorithm without changing the cost of the optimal solution, thereby proving the greedy solution is optimal.

    1. Define your solution Introduce variables to denote the greedy solution $G$ and the optimal solution $O$ (i.e. for task scheduling, it is the set of tasks).
    2. Compare solutions Show that $G \neq O$ (i.e. there is some element of $G$ not in $O$. It helps to name this different element).
    3. Exchange pieces Show how to transform $G$ into $O$, typically by moving element previously defined in $G$ to $O$. Then prove by doing os you did not worsen $O$'s optimality, and therefore have a different optimal solution.
    4. Iterate Argue by continuing the exchange of pieces, you can turn $G$ into $O$ without reducing optimality, and therefore $G$ is optimal.

Dynamic Programming


Dynamic programming is a method to find optimal solution to problems, from optimal solutions to subproblem. Because sets of subproblems to solve larger problems overlap, each subproblem can be calculated once, it's solutions are stored for next future calls.

Common examples of Dynamic Programming include

  • Dijkstra's algorithm
  • Integer knapsack
  • Optimal matrix chain multiplication

Sample Algorithms

Fibonacci

Whilst simple this problem showcases the concept of dynamic programming very well.

Create a function fib where given $i$, find the $i^{th}$ fibonacci number. This can be defined with the following mathematical recurrence relation

$$ F_n = \begin{cases} 0 & \text{if } n = 0 \\ 1 & \text{if } n = 1 \\ F_{n - 1} + F_{n - 2} & \text{otherwise} \end{cases} $$

or with the following python code

# A function to calculate the nth fibonacci number
def fib(n: int) -> int:
    if n == 0:
        return 0
    if n == 1:
        return 1
    return fib(n - 1) + fib(n - 2)

However, the runtime of this is $O(2^n)$, an exponential runtime, as each function call itself twice until it reaches the base case.

Top down solution

In top down dynamic programming, we use the process of memoisation (caching previous function call results) to reduce the runtime to $O(n)$, and space complexity of $O(n)$.

# Top down solution to find the nth fibonacci number
def fib(n: int) -> int:
    # Create memoisation table and add base cases
    memo = [-1] * n
    memo[0] = 0
    memo[1] = 1
    return dp(n, memo)

def dp(n: int, memo: list[int]) -> int:
    # If we have not calculated the value before
    if memo[n] == -1:
        memo[n] = dp(n - 1, memo) + dp(n - 2, memo)

    # Now we can simply return the stored value in O(1)
    return memo[n]

Bottom up solution

In bottom up DP, we write an iterative solution to compute the value of every subproblem to reduce the runtime to $O(n)$ and space complexity of $O(1)$.

# Bottom up solution to find the nth fibonacci number
def fib(n: int) -> int:
    # Base Case
    if n == 0:
        return 0
    if n == 1:
        return 1

    # Iterate up to the nth fibonacci number
    a, b = 0, 1
    for i in range(2, n + 1):
        temp = a + b
        a = b
        b = temp
    return b

Integer Knapsack

Given a set of $n$ items with weights and values,

$$I = \{ (v_1, w_1), (v_2, w_2), ..., (v_n, w_n) \},$$

and a knapsack which has a weight capacity of $c$, choose a combination of items, maximising the value of the items while ensuring the total weight is less than $c$.

We can define the following recurrence relation

$$ F(i, c) = \begin{cases} 0 & \text{if } i = 0 \text{ or } c \leq 0 \\ \max(F(i - 1, c - w_i) + v_i, F(i - 1, c)) & \text{otherwise} \end{cases} $$

where

  • $i$ is the index of the current item
  • $c$ is the current capacity

which has the base case that we return 0 if we have no items or we are out of capacity, else we start at an item, and then consider the maximum value we can get if:

  1. we include it in the knapsack in which case, we increase the value we get by the value of the current item, reduce the capacity by the weight of the current item, and then consider what to do with the next item.
  2. we do not include it in the knapsack in which case, we do nothing, and merely consider what to do with the next item.

The runtime of this is unfortunately $O(2^n)$ (since each subproblem makes two recursive calls and it has a maximum depth of $n$), however since there are recalculated subproblems (i.e. when $i$ and $c$ is the same), we can use dynamic programming.

Memoisation

In calculating the $n^{th}$ fibonacci value, there was only one parameter changed in repeated subproblems, therefore we only needed a $n$ sized 1D memoisation table.

However, for the integer knapsack problem, the value of each subproblem is dependent on both $i$ and $c$ so therefore we need a 2D $n \times c$ memoisation table.

# Bottom up solution for the integer knapsack problem
def knapsack(items: list[tuple[int, int]], c: int) -> int:
    n = len(items)
    memo = [[0] * (n + 1) for _ in range(c + 1)]

    # Build table in bottom up manner
    for i in range(n + 1):
        for w in range(c + 1):
            i_value, i_weight = items[i]
            # If no more items or remaining capacity is 0
            if i == 0 or w == 0:
                memo[i][w] = 0
            # If we can have the new item without going over capacity
            elif items[i][1] <= w:
                include_v = memo[i - 1][w - i_weight] + i_value
                ignore_v = memo[i - 1][w]
                # Choose the option that gives the most value
                items[i][w] = max(include_v, ignore_v)
            # If we cannot fit any more items
            else:
                memo[i][w] = memo[i - 1][w]

    return memo[n][c]

Because at memo[i][j] will always be the maximum value the bag can hold, after considering whether to include or not the first i items, and when the bag has a maximum capacity of j, then therefore memo[n][c] will be the maximum value a bag of capacity c can hold after considering all n items.

Time Complexity

The final time complexity is $O(nc)$ as there is a nested loop that loops $n \times c$ times, and each operation within the loop is $O(1)$ as they all only require $O(1)$ array access or arithmetic. Note that because the runtime is dependent on the numeric value of the input, and not just the length of the input size, the runtime of this algorithm is not polynomial, but pseudo-polynomial.

Space Complexity

The space complexity is $O(nc)$ since there is a 2D $n \times c$ memoisation table to store the solutions of previous computations. It is important to keep in mind that despite the bottom up solution for fibonacci having a space complexity of $O(1)$, the space complexity here is the same as the top down solution as all previous computations need to be remembered. Similar to runtime, this also has a pseudo-polynomial space complexity.

Linear Programming


Linear programming is an optimisation method to maximise or minimise a linear function (our objective) given linear constraints on variables.

This can be represented in mathematical model with

Component Mathematical Representation
Variables $x_j \geq 0 \text{ for } 1 \leq j \leq n$
Objective $\text{maximise or minimise } \sum^{n}_{j = 1} c_j x_j$
Constraints $\sum^{n}_{j = 1} a_{ij} x_j R_i b_i, \text{ for } 1 \leq i \leq m \text{ with } R_i \in \{\leq, =, \geq \}$

A feasible solution is a variable assignment satisfying all constraints.

An optimal solution is a feasible solution satisfying the objective.

Canonical form

A linear programming formulation can also be represented by the canonical form.

maximise

$$\textbf{c}^\text{T}\textbf{x}$$

subject to the constraints

$$ \begin{align*} A\textbf{x} & \leq \textbf{b} \\ \textbf{x} & \geq 0. \end{align*} $$

where

$$ \textbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}, \textbf{c} = \begin{pmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{pmatrix}, \textbf{b} = \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{pmatrix}, A = \begin{pmatrix} a_{1, 1} & a_{1, 2} & \ldots & a_{1, m} \\ a_{2, 1} & a_{2, 2} & \ldots & a_{2, m} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n, 1} & a_{n, 2} & \ldots & a_{n, m} \end{pmatrix}. $$

Runtime

Solving a LP (Linear Programming) formulation is in P (has a polynomial runtime), whilst solving an ILP (Integer Linear Programming - similar to LP, except that all the variables must be integers) formulation is in NP.

In practice, given an optimisation problem, a person formulates the linear programming problem, which is then given to an LP solver which uses algorithms such as the simplex algorithm.

Reductions


Reductions are the conversion of one problem to another. An efficient reduction from A to B may be used to show that B is at least as difficult as A.

Complexity Classes

P (Polynomial)

A problem $A(n)$ is in class P, denoted by $A \in \mathbf{P}$, if there is an algorithm which solve it in polynomial time with regard to the size of the input.

NP (Non deterministic Polynomial Time)

A problem $A(n)$ is in class NP, denoted by $A \in \mathbf{NP}$, if there is an algorithm which can verify if a solution is correct or not in polynomial time with regard to the size of the input.

  • A problem that is in P is also in NP.
  • Hence, NP problems are at least as hard as P problems.

NP-Hard and NP-Complete

A problem is NP-Hard if any problem in NP is reducible to it (informally, a problem is NP-Hard if it it is at least as hard as the hardest problems in NP).

A problem is NP-Complete if it is in both NP, and NP-Hard (informally, a problem is NP-Complete if it is one of the hardest problems in NP, the "complete" meaning that it is able to simulate any problem in the same complexity class through reductions).

  • Hence, NP-Hard problems are at least as hard as NP-Complete.

NP Problems

3SAT

The SAT problem is: Given a propositional formula in the CNF form $C_1 \land C_2 \land ... \land C_n$ where each clause $C_i$ is a disjunction of propositional variables or their negations, i.e.

$$ (P_1 \lor \lnot P_2 \lor P_3 \lor \lnot P_5) \land (P_2 \lor P_3 \lor \lnot P_5 \lor \lnot P_6) \land (\lnot P_3 \lor \lnot P_4 \lor P_5) $$

Is there some assignment of boolean values to $P_1, P_2, ..., P_6$ which makes the formula true?

If each clause involves exactly 3 variables, then this problem is 3SAT.

Cook's Theorem (1982 Turing Award)

Theorem: Every NP problem is polynomially reducible ot the SAT problem

This means that if the SAT problem is solvable in polynomial time, P = NP, as other NP problems can be reduced to SAT in polynomial time and then solved in polynomial time.

Karp's 21 Problems [1972] (1985 Turing Award)

Karp's 21 problems are a set of problems which show that there is a many-one reduction from the SAT problem to other NP problems, therefore showing they are all NP-complete.

  • Satisfiability
    • 0 - 1 Integer Programming
    • Clique
      • Set Packing
      • Vertex Cover
        • Set Cover
        • Feedback Node set
        • Feedback Arc Set
        • Direct Hamiltonian Cycle
          • Undirected Hamiltonian Cycle
    • 3SAT
      • Graph Coloring
        • Clique Cover
        • Exact cover
          • Hitting Set
          • Steiner Tree
          • 3D Match
          • Subset Sum
            • Job Sequencing
            • Partition
              • Max Cut

3GC to SAT Reduction

3GC AKA 3 Graph Colouring is a problem that asks given an undirected graph $G = (V, E)$, and a set of colours $C = \{r, g, b\}$, is there a function $f: V \rightarrow C$ such that if $(v, w) \in E$ then $f(v) \neq f(w)$.

Using the notation that $v_i$ is a proposition that is true if vertex $v$ is the colour $i$, we can use the following rules to complete the reduction.

  1. Enforce that each vertex is one colour only. We can enforce this through 2 rules.
    1. A vertex is no more than one colour: $$\forall v \in V : (\lnot v_1 \lor \lnot v_2) \land (\lnot v_1 \lor \lnot v_3) \land (\lnot v_2 \land \lnot v_3)$$ If any of the vertices are more than one colour, then both propositions will be true inside the clause. Since we take the negation of the proposition, both propositions are evaluated as false, and their disjunction evaluates to false.
    2. A vertex is at least one colour: $$\forall v \in V : (v_1 \lor v_2 \lor v_3)$$ If a vertex is not any colour, then the clause evaluates to false.
  2. Enforce that adjacent vertices are not the same colour. $$\forall (v, w) \in E : (\lnot v_1 \lor \lnot w_1) \land (\lnot v_2 \lor w_2) \land (\lnot v_3 \lor \lnot w_3)$$ Similar to 1.a, if any adjacent vertices are the same colour, then the clause evaluates to false.