Bayesian Updating on a Binary Random Variable

This is a web app that helps you visualize Bayesian updating on a set of hypotheses for a binary random variable.

Some people like to hear an explanation of the general, abstract concept first. Other people prefer to start with a concrete example. Read the following two sections in whatever order you prefer.

Abstract Explanation

$P (H_{i} | E) = \frac{P (E | H_{i}) P (H_{i})}{P (E)}$
Bayes Theorem

Given a (possibly infinite) set of mutually exclusive hypotheses $H_{0}, H_{1}, H_{2}, …$ , for each hypothesis $H_{i}$ , we wish to calculate the probability $P (H_{i} | E)$ that the hypothesis is true, given that we witnessed some evidence $E$ . In other words, we wish to update our beliefs about the plausibility of the various hypotheses we have under consideration in response to witnessing some evidence (either for or against) those hypotheses. We start off by having some prior beliefs about the probability of the hypotheses $P (H_{i})$ such that $\sum_{\forall i}^{} P (H_{i}) = 1$ (i.e. we consider all possible hypotheses). Typically, if we have don't know which hypotheses to favor, we simply assign a uniform distribution over the hypotheses (there are some theoretical justifications for doing this, e.g. Solomonoff induction, which is beyond the scope of this article.)

Each of these hypotheses gives a probability distribution for various events. In our case, we are only interested in binary random variables, and so there are only two events of interests. Call one of these events the "success" event, denoted by $E$ , and call the other event the "failure" event, denoted by $\neg E$ . Since these are the only two possible events, we have $\forall i, P (E | H_{i}) + P (\neg E | H_{i}) = 1$ .

We now have a bunch of hypotheses, each one giving different predictions about the likelihood of witnessing or not witnessing an event $E$ . We also have a confidence levels for each of these hypotheses. We can combine these two pieces of information together to compute an overall probability of witnessing an event E:

P (E) = \sum_{\forall i}^{} P (E | H_{i}) P (H_{i})

and

P (\neg E) = \sum_{\forall i}^{} P (\neg E | H_{i}) P (H_{i})

However, rather than performing the summation, let us instead represent this equation as a probability tree:

As always with probability trees, the probability for the nodes in each vertical slice sums up to 1. The left-most slice consists of just the single node U which represents the universe of all possibilities. By definition, this has probability 1. The middle slice summing up to 1 is consistent with our earlier assertions that $\sum_{\forall i}^{} P (H_{i}) = 1$ . The right-most slice simply splits each hypothesis into two further possibilities, one where $E$ is observed and one where $\neg E$ is observed. As asserted earlier, $P (E) = \sum_{\forall i}^{} P (E | H_{i}) P (H_{i})$ and $P (\neg E) = \sum_{\forall i}^{} P (\neg E | H_{i}) P (H_{i})$ , and given that $P (E) + P (\neg E) = 1$ , we can thus confirm that the right-most slice also sums up to 1.

Without loss of generality, let's say we observe event $E$ (just swap the labels for $E$ and $\neg E$ if you want to see the converse case). What happens to our probability tree?

Intuitively, we are deleting the nodes which consist entirely of states where $E$ is false (or equivalently, where $\neg E$ is true), and then renormalizing the remaining nodes so that each vertical slice once again sums up to 1.

The probability $P (\neg E | E) = 0$ , and the probability $P (Q \cap \neg E | E) = 0$ for any and all propositions $Q$ , so this is how the deletion "formally" happens. The deleted probability mass is exactly $P (\neg E)$ , and the remaining mass is $P (E)$ , so we can renormalize the existing nodes by dividing the old values $P (H_{i} \cap E)$ by $P (E)$ . That is to say, $\forall i, P (H_{i} \cap E | E) \equiv \frac{P (H_{i} \cap E)}{P (E)}$ .

Note that we also have $\forall i, P (H_{i} \cap E | E) \equiv P (H_{i} | E)$ , as can be seen from the probability-tree: The nodes in the middle slice representing $P (H_{i} | E)$ only have one non-zero-weight edge coming out of them, and so their probability must be equal to the sole non-zero child.

Substituting in the two equations together, we have $\forall i, P (H_{i} | E) \equiv \frac{P (H_{i} \cap E)}{P (E)}$ and finally $\forall i, P (H_{i} | E) \equiv \frac{P (E | H_{i}) P (H_{i})}{P (E)}$ , thus independently rederiving Baye's Theorem. ∎

Concrete Example

A binary random variable is a random variable that can only take on exactly two possible values. The classic example of a binary random variable is a coin flip, which can end up as either heads or tails.

When dealing with coin flips, we often assume (or "hypothesize") that the coin is fair (i.e. that for every flip of the coin, it is equally likely to land on heads as it is to land on tails). However, that need not be the case. The hypothesis "This coin is 50% likely to land on heads" is just one of many hypotheses; other possibilities include "This coin is 51% likely to land on heads", "This coin is 66% likely to land on heads", "This coin is 0% likely to land on heads", and so on.

When you first receive a new coin, you don't know whether it's fair or not. If you want to find out whether or not it's fair, an easy test to do is to flip it a few times, and observe what sort of results you get.

If you flip the coin 10 times, and all 10 times, the coin lands tails, then maybe it's a fair coin and you just happened to get really (un)lucky, but it seems much more probable that this is a coin that's biased towards landing on heads. At any rate, you can completely rule out the hypothesis "This coin is 0% likely to land on heads". And you can almost, but not quite rule out "This coin is 1% likely to land on heads": It's extremely unlikely that you'd get 10 heads in a row on a coin that's heavily biased towards tails, but not outright impossible.

This app helps you keep track of the probabilities of the various hypotheses. To be sure, this is a bit meta: The hypotheses themselves are assigning probabilities to a binary random variable, and then you are assigning probabilities to the hypotheses. For example, you might calculate that the probability that the hypothesis "the probability that this coin will land heads is 73%" is true is 26%.

Start by clicking the "Reset Experiment" button. You'll see a big blue rectangle, indicating that all hypotheses are of equal probability. Then, you'll have to define your random variable. Let's stick with coin flips for now, and define "heads" to be a "success" and "tails" to be a "failure".

Now let's flip our imaginary coin, and let's say the coin comes up tails. That means we've witnessed a "failure", so click the "Witnessed Failure" button. You should immediately see the bar graph update itself. Each column represents a hypothesis, with the column all the way on the left representing the hypothesis "You are 0% likely to see a success" (which in our example scenario translates to "You are 0% likely to get a head when you flip the coin"); the column all the way to the right represents the hypothesis "You are 100% likely to see a success", and all the columns in between represent each intermediate hypothesis in increments of 1%. The height of the column represents how likely it is that that particular hypothesis is true.

After having witnessed a failure, we can rule out the hypothesis "You are 100% likely to see a success" (because if that hypothesis were true, it would have been impossible to witness a failure), and as expected, the column all the way on the right has shrunk down to a height of 0.

Let's say we flip the coin again, and this time we witness a success (that is, we see the coin land on heads). We would then click on the "Witnessed Success" button, and see the bar chart update appropriately. With exactly one success and one failure, the most probable hypothesis is that there's a 50% chance of seeing a success, but notice that the curve that the curve of the bar chart is quite wide, indicating that we don't have enough evidence to be highly confident that we're dealing with a fair coin (we only have a sample size of 2, after all...)

As you add more data (e.g. try randomly clicking on "Witness Failure" and "Witness Success" a dozen times), you should see the curve get narrower and narrower, indicating growing confidence of the hypotheses that have not been eliminated.

Ideas to play around with

Thanks and Further Reading

I put this page together with the help of the following free online resources (in alphabetical order):

There is a bug that will prevent the dynamic bar chart from rendering correctly in your browser.

What is this?

Abstract Explanation

Concrete Example

Ideas to play around with

Thanks and Further Reading