Foothills of Combinatorics (part 1)

November 3, 2024

This is the first of a few notes about combinatorics and probability. Simpler concepts first, and eventually might get to stuff like analytic combinatorics.

I’m interested in the point where more simple combinatorics and multinomials become blunt tools and we have to move to higher concepts like generators and species.

Let’s start with the icon: pascal’s triangle. If you haven’t heard of pascal’s triangle, this entire note might not make much sense to you :)

Below is the classic triangle, constructed by starting with $1$ and on successive rows add the top-left and top-right numbers. The numbers generated in rows are the binomial coefficients of $(a+b)^n$.

        1        
      1   1      
    1   2   1    
  1   3   3   1  
1   4   6   4   1
      . . .

Here’s the same triangle but with the numbers written as combinations: $\binom{m}{n}$. So row $m$ contains all the ways to pick $n$ items from a pool of $m$:

            (0,0)
         (1,0) (1,1)
      (2,0) (2,1) (2,2)
   (3,0) (3,1) (3,2) (3,3) 
(4,0) (4,1) (4,2) (4,3) (4,4)
            . . .

💡

Let’s do ourselves a favour and lay the numbers out as a right-angled triangle. After all, we’re fundamentally working with a DAG here, and not some exact embedding in $R^d$; the latter would be a presentation detail.

Using a right-angled triangle aligns nicely with array storage and makes the data is easier to look at and think about (especially once we look at higher dimensions):

1               (0,0)                           (0,*)
1  1            (1,0) (1,1)                     (1,*) 
1  2  1         (2,0) (2,1) (2,2)               (2,*)
1  3  3  1      (3,0) (3,1) (3,2) (3,3)         (3,*)
1  4  6  3  1   (4,0) (4,1) (4,2) (4,3) (4,4)   (4,*)
.  .  .         .  .  .                         . . .

It’s now easier to see that the items in the combinatorics triangle are based on the column and row index, and that row $n$ contains all the possibilities for $\binom{n}{*}$¹.

Suit and tie

Getting slightly more formal: What kind of beast is pascal’s triangle? It’s a kind of spatial recurrence relation, or static cellular automata² which has a generator.

The definition of pascal’s triangle generator is usually given as something like:

$$ a_{n,k} = a_{n−1,k−1} + a_{n,k-1} $$

That’s fine for 2D, but if you want to move to higher dimensions and really see things, that style is a giant headache hammer³.

Here’s a much nicer format for the 2D generator:

$$ g_p^2 = \left[ \Sigma, \begin{pmatrix} \hfill 1 & \hfill 1 \\ \hfill 0 & \hfill 1 \end{pmatrix} \right] \tag{1} $$

So in this scheme, a generator consists of a reducer function⁴ and a matrix of $m$ offset vectors (placed in rows) of dimension $d$. And $g_p^2$ denotes the pascal generator for dimension 2.

💡

We use the convention that the offset vectors in our matrix are subtracted from the reference point, rather than added.

This is because it avoids the need to write a lot of negatives. And in this scheme, if you plot the matrix vectors as points, they take the same shape as the generated number system. If we just had the vectors as negatives and added them, plotting the matrix vectors as points would result in a shape that was the inversion of the generated number system, with every axis mirrored — a confusion we can avoid.

The reason that negatives have to be involved at all is practical, not mathematical: in code, to grow arrays/vectors out in a positive direction from 0 index, you want to refer to content in the negative direction.

Coins and dice

The rows of pascal’s triangle form a probability mass function⁵ for a series of A/B events, usually portrayed as a fair coin being tossed $n$ times.

The regular row values are weighted probabilities (relative probability) and row $r$ sums to $2^{r+1}$. So if you divide every row by $2^{r+1}$ you get the actual probabilities for any occurrence (e.g. one head and three tails), which sum to 1. And of course the cumulative probability (‘throw exactly n heads or less’) is a value plus all values to its left.

1               / 1 =  1                    sum=1
1  1            / 2 =  1/2  1/2             sum=1
1  2  1         / 4 =  1/4  1/2  1/4        sum=1
1  3  3  1      / 8 =  1/8  3/8  3/8  1/8   sum=1

The pascal combining function $f(a, b) = a + b$ represents a fair coin. If you weight the sum you get triangle rows representing an unfair coin: $f_k(a, b) = ka + b$ where $k / (k + 1)$ is the probability of a head.

If we’re interested in generating the probabilities directly in our triangle, we can cut to the chase by using generating function $F_p(a, b) = pa + (1-p)b$, where $p$ is the probability of throwing a head.

💡

There’s a common convention in probability texts which I think should be called out more clearly:

We list outcomes of coin toss trials like HHTH with the habit that H actually just stands in for ‘whatever got thrown first’ and T represents the ‘other side’. So in this convention you’ll never see a sequence beginning with T.

A useful model of $\binom{a}{b}$: ‘bookends’

Given the definition of the choose function:

$$ \binom{a}{b} = \frac{a!}{(a-b)!b!} $$

there’s a useful sort of accounting picture of what it represents, which can be useful when we start thinking of multinomials later, which I call ‘bookends’. It’s useful because you can reason about multinomial behaviour etc. in your head (or paper) much more easily by imagining bookends.

So regarding the $\binom{a}{b}$ expression above, consider the range $(1 \dots a)$, and make a fraction of the $b$ numbers from start and end of the range, as follows:

$$ \frac{\overset{ \begin{array}{c} \text{$b$ numbers from end of the range, i.e. the $a!(a-b!)$ part} \\ \\[-1pt] \end{array} } { a.(a-1).(a-2) \dots (a-b+1) }} {\underset{ \begin{array}{c} \\[-1pt] \text{$b$ numbers from start of the range, i.e. the $b!$ part} \end{array} } {1.2.3 \dots b}} $$

(The top and bottom bits of the range/fraction are our ‘bookends’, btw.)

A practical example:

$$ \binom{6}{4} = \frac{6.5.4.3}{4.3.2.1} $$

Now notice how it makes the cancellation here obvious:

$$ \binom{6}{4} = \frac{6.5.\cancel{4.3}}{\cancel{4.3}.2.1} = \frac{6.5}{2.1} $$

and this is in fact binomial symmetry at play, due to $\binom{a}{a-b} = \binom{a}{b}$.

Higher pascal’s triangle: the simplex

Pascal’s triangle extends into higher dimension analogues in pleasing ways.

The binomial coefficients are extended into a world of multinomial coefficients. For example, 3D pascal simplex contains the coefficients of $(a+b+c)^n$.

And the choose numbers $\binom{a}{b}$ are extended into the world of multi-choose:

$$ \binom{a}{b_0, b_1, \dots b_n} $$

This is the number of ways of choosing $b_0$ from $a$ objects, then choosing $b_1$ from what remains, and so on.

It’s defined, intuitively, as the chaining of choose operations with a decreasing pool size:

$$ \binom{a}{b_0, b_1, \dots b_{n-1}, b_n} = \binom{a}{b_0} . \binom{a-b_0}{b_1} . \binom{a-b_0-b_1}{b_2} \dots \binom{a-b_0 - \dots b_{n-1} }{b_n} $$

Multinomial identities

From playing with the bookends idea introduced earlier (or otherwise), we can jot down a few rules:

$$ \begin{aligned} \binom{a}{0_*} = \binom{a}{a} = 1 \\ \\ \binom{a}{1} = \binom{a}{a-1} = a \\ \\ \binom{a}{ 1_n } = {}_a \mathrm{P}_n \\ \\ \binom{a}{ 1_a } = a! = {}_a \mathrm{P}_a \\ \\ \binom{a}{ \dots 0_* \dots } = \binom{a}{ \dots } \\ \\ \binom{a}{ b_0, b_1, \dots b_n } = \binom{a}{ \{ b_0, b_1, \dots b_n \}_{multiset} } \end{aligned} $$

This last identity is a roundabout way of saying that the order of the $ b $ items doesn’t matter, i.e. they’re a multiset.

Continued in part 2

I’m using wildcard ‘$*$’ to mean ‘the entire slice’ or ‘all possible values’ ↩︎
So many possible names though! In computational mathematics, numerical analysis, or physics it might be called a ‘lattice-based recurrence’ or ‘local recurrence’. The devil has many monickers ↩︎
I have a headache hammer. It’s a rubber mallet thing for putting up shelves. When I first used it I noticed a weird waft from it, so I sniffed it: instant headache. What is in that rubber? Once every few years I sniffed to the hammer to check its potency. It faded over time ↩︎
The reducer (combining) function might be a one-shot that takes a tuple (list), or it might be an actual reducer that operates on e.g. two items at a time ↩︎
reminder: probability mass function (PMF) is for discrete distributions, and probability density function (PDF) is for continuous distributions ↩︎