Coin flip probabilities

Is it a fair coin?

I am going to create an intuitive understanding of Bayes’ Theorem, or Bayesian Inference. I won’t go into the technical details because, that’s what many intros to Bayesian inference are doing. Instead, we play and learn.

I will still give you Bayes’ Theorem, so you can try to see how it works it’s magic in the coin flipping example below. All I’m saying is that Bayesian inference is counting.

\[p(\theta|x) = \frac{p(x|\theta)p(\theta)}{p(x)}\]

The term on the left \(p(\theta|x)\) is the posterior - How probable is the parameter \(\theta\) given the data \(x\).

The terms on the right are the likelihood \(p(x|\theta)\) - How probable are the data \(x\) given the parameter \(\theta\)?

The prior \(p(\theta)\) - How probable is the parameter \(\theta\) without seeing any data \(x\)?.

And, lastly, the evidence \(p(x)\) - How probable is it to observe the data \(x\)?

Now, let’s forget that this means anything. The only thing we need to know is that when tails (T) comes up, the probability distribution function \(p(T) = 2-2x\) and \(x \in {0,1}\). That’s a linear function that is 2 at \(x=0\) and 1 at \(x=1\). It’s also a right triangle and a proper probability distribution function because \((2\times1)/2 = 1\) is the right triangle’s area, which is also the area under the curve. You can see it by typing in T.

Now, for heads, it’s similar. The probability of H is \(p(H) = 2x\). That’s a linear function that is 1 at \(x=0\) and 2 at \(x=1\). Again, it’s both a right triangle and a proper probability distribution function. You can see it by typing in H.

OK. What happens next feels like magic: When you enter multiple coin flip results (e.g., TT, or HT, or HTHHT...), the probability turns into a more recognizable probability distribution. Wow! Why is that? It is simple (and repeated) multiplication of those triangles. Really! (You can check the source code of this site and look for f_head(x) and f_tail(x).

So, what’s going on? We start from the prior belief that both heads and tails are equally likely: a uniform prior distribution. If we observe tails, we multiply our prior with the “tails triangle”. This now becomes our new prior and with each subsequent coin flip, we keep updating our posterior based on previous evidence.

That’s what I mean by counting. You just multiply it by what you see (kinda).

(I haven’t talked about evidence \(p(x)\) and I won’t.)

So have a go yourself. Below is an input form where you can enter your coin flips. Starting form a uniform distribution of heads (H, on the right) and tails (T, on the left), you can see how the posterior is updated based on the new data coming in, i.e., the evidence.

Enter sequence of H and T: