04 — Probability

Comprehensive study guide for Bocconi Math Module 2 (cod. 30063), General Exam. Source materials: Alice Sicconi's Probability Proofs.pdf, Probability.pdf, lect{1..8}_prob (2).pdf, TA1_prob_41.pdf, Probability HW.pdf, General_24524_ENG_SOL.pdf.

§1. Overview & Exam Relevance

Probability is the first block of the second partial and accounts for approximately 14% of the general exam (typically 1–2 MCQs worth 5 pts each plus one open-ended question worth up to 20 pts — about 25–30 pts of the 150-pt exam). It is the topic where two May 2024 MCQs (MCQ4, MCQ5) and one flagship 20-pt theorem-statement-plus-proof question (Q10) all live simultaneously, so the marginal value of mastering the definitions cleanly is very high.

Topic scope. The exam tests:

the measure-theoretic hierarchy set function → measure → probability: grounded, positive, additive, normalized;
Kolmogorov axioms for a probability $\mathbb{P} : 2^\Omega \to [0,1]$ and their consequences (complement, monotonicity, inclusion–exclusion, union bound);
Dirac probability $\delta_{\omega_0}$ — the canonical "point mass" example (May 2024 Q10a);
Countable additivity (σ-additivity) as a property that may or may not hold, and the proof that every Dirac probability is countably additive (May 2024 Q10c);
Simple probabilities, support, convex-combination representation, and the equivalence between simple probabilities and convex linear combinations of Dirac probabilities;
Conditional probability $\mathbb{P}(A\mid B)$ , Bayes' theorem, law of total probability, independence of events and random variables;
Random variables $f : \Omega \to \mathbb{R}$ (Sicconi uses the letter $f$ for a random variable; Marinacci uses $X$ ), expected value $E_{\mathbb{P}}(f)$ , variance $V_{\mathbb{P}}(f)$ , standard deviation $\sigma_{\mathbb{P}}(f)$ , covariance $\operatorname{Cov}_{\mathbb{P}}(f,g)$ , linear correlation coefficient $\rho_{\mathbb{P}}(f,g)$ ;
Linearity of expectation, variance and covariance properties (computation formula, affine-function rules, bilinearity), Cauchy–Schwarz bound $|\operatorname{Cov}(f,g)| \le \sigma(f)\sigma(g)$ ;
Named discrete distributions: Bernoulli, Binomial, Geometric, Poisson, uniform discrete — PMFs with mean and variance;
Named continuous distributions: uniform on $[a,b]$ , (negative) exponential with rate $\alpha$ , Gaussian/standard normal — density functions, distribution functions, expected values;
Distribution function $\Phi(x) = \mathbb{P}(f \le x)$ , density $\varphi(x)$ , carrier $[a,b]$ , essentially bounded random variables;
Markov's and Chebyshev's inequalities (absolute-value tail bounds).

Typical MCQ patterns (May 2024 General Exam).

MCQ4 (Mode A) / MCQ3 (Mode B): "Consider $\Omega = \mathbb{N}$ and the Poisson probability $\mathbb{P}$ with parameter $\lambda = 2$ . The probability of the event $E = \{n \in \mathbb{N} : n \ge 3\}$ is …". Technique: use the complement, $\mathbb{P}(n \ge 3) = 1 - \mathbb{P}(n \le 2) = 1 - e^{-2}(2^0/0! + 2^1/1! + 2^2/2!) = 1 - e^{-2}(1 + 2 + 2) = 1 - 5 e^{-2}$ .
MCQ5 (Mode A) / MCQ4 (Mode B): "Let $f, g$ be two random variables on $(\Omega, \mathbb{P})$ with $V_{\mathbb{P}}(g) = 100$ , $\rho_{\mathbb{P}}(f,g) = -0.1$ and $|\operatorname{Cov}_{\mathbb{P}}(2f, 5g)| = 90$ . Then $\sigma_{\mathbb{P}}(f) = ?$ ". Technique: bilinearity gives $\operatorname{Cov}(2f, 5g) = 10\operatorname{Cov}(f,g)$ , so $|\operatorname{Cov}(f,g)| = 9$ , and the sign is read off from $\rho = -0.1 < 0$ , i.e., $\operatorname{Cov}(f,g) = -9$ . From $\rho = \operatorname{Cov}/(\sigma(f)\sigma(g))$ with $\sigma(g) = 10$ : $-0.1 = -9/(10 \sigma(f))$ so $\sigma(f) = 9$ .

Typical open-ended pattern (May 2024 Q10 Mode A / Q9 Mode B).

Part (a) — Define the Dirac probability $\delta_{\omega_0}$ over a state space $\Omega$ .
Part (b) — Give the definition of countable additivity for a probability $\mathbb{P} : 2^\Omega \to [0,1]$ .
Part (c) — Prove that any Dirac probability is countably additive. The official solution points to "example 1993, definition 2003, proposition 2006 (first part of the proof)" — i.e., this is a pure memorization-plus-execution question, and the candidate must reproduce the two-case proof exactly.

Why this topic is high-leverage.

Every calculation in finance involving expected return or variance is an application of the $E$ / $V$ / $\operatorname{Cov}$ machinery proved here.
The measure-theoretic framework (grounded/positive/additive) mirrors the Riemann-integral framework of §03: what "mass" is to probability, "area" is to the integral.
Q10 is one of the highest-yield 20-point questions: if you memorize proposition 2006 and know how to write definition 2003 cleanly, you can bank ≈20 points in under 12 minutes.

§2. Definitions

2.1 Sample space, state, event

The sample space (or state space) $\Omega$ is the set of all possible outcomes of an experiment.

Elements $\omega \in \Omega$ are states (of the world) — individual outcomes. Subsets $A \subseteq \Omega$ are called events. The pair $(\Omega, \mathbb{P})$ is called a probability space.

Three types of sample space (lect1):

Discrete finite: $|\Omega| = n$ . Example: rolling a die, $\Omega = \{1, 2, 3, 4, 5, 6\}$ .
Discrete countable: $\Omega$ is infinite but countable. Example: $\Omega = \mathbb{N} = \{0, 1, 2, \ldots\}$ .
Continuous: $\Omega$ is uncountable. Example: $\Omega = [0, 2] \subseteq \mathbb{R}$ .

2.2 Power set $2^\Omega$

The power set $\mathcal{P}(\Omega) = 2^\Omega$ is the collection of all subsets of $\Omega$ . For a finite sample space with $|\Omega| = n$ , we have $|2^\Omega| = 2^n$ .

Every probability measure is defined on $2^\Omega$ : it assigns a real number to each event. The notation $2^\Omega$ (rather than $\Omega$ ) emphasises that the domain of $\mathbb{P}$ is the set of events, not the set of outcomes.

Example (lect1 p.1). $\Omega = \{1, 2, 3\}$ , $2^\Omega = \{\emptyset, \{1\}, \{2\}, \{3\}, \{1,2\}, \{2,3\}, \{1,3\}, \Omega\}$ . There are $2^3 = 8$ events; $\mathbb{P}(\emptyset) = 0$ and $\mathbb{P}(\Omega) = 1$ are hard-coded by the Kolmogorov axioms.

2.3 Set function, measure, probability (the hierarchy)

(Source: lect1_prob (2).pdf pp.1–3.)

Set function $M : 2^\Omega \to \mathbb{R}$ — the least restrictive object: it simply assigns a real number to every event.

Measure $M : 2^\Omega \to [0, +\infty)$ — a set function satisfying the three measure axioms:

Grounded: $M(\emptyset) = 0$ .
Positive (non-negative): $M(A) \ge 0$ for every $A \subseteq \Omega$ .
(Finitely) additive: for every $A, B \subseteq \Omega$ with $A \cap B = \emptyset$ , $M(A \cup B) = M(A) + M(B).$

Probability $\mathbb{P} : 2^\Omega \to [0, 1]$ — a measure that additionally satisfies the normalisation axiom:

Normalized: $\mathbb{P}(\Omega) = 1$ .

The four properties 1–4 are the Kolmogorov axioms. Every probability is a measure; every measure is a set function. The reverse inclusions fail in general (cf. Example 2.14 below).

2.4 Probability — Kolmogorov axioms

A function $\mathbb{P} : 2^\Omega \to [0, 1]$ is a probability (or probability measure) on $(\Omega, 2^\Omega)$ if:

(K1) Non-negativity — $\mathbb{P}(A) \ge 0$ for every event $A \subseteq \Omega$ . (K2) Normalization — $\mathbb{P}(\Omega) = 1$ . (K3) (Finite) additivity — if $A, B \subseteq \Omega$ and $A \cap B = \emptyset$ , then $\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B)$ .

The axiom $\mathbb{P}(\emptyset) = 0$ is a consequence of the other three (take $A = B = \emptyset$ in K3; then $\mathbb{P}(\emptyset) = 2\mathbb{P}(\emptyset)$ , so $\mathbb{P}(\emptyset) = 0$ ).

2.5 Dirac probability $\delta_{\omega_0}$

Fix $\omega_0 \in \Omega$ . The Dirac probability concentrated at $\omega_0$ is the function $\delta_{\omega_0} : 2^\Omega \to [0, 1]$ defined by $\boxed{\;\delta_{\omega_0}(E) \;=\; \begin{cases} 1 & \text{if } \omega_0 \in E \\ 0 & \text{if } \omega_0 \notin E \end{cases} \qquad \forall\, E \subseteq \Omega.\;}$

This is the Marinacci example 1993. Intuition: $\delta_{\omega_0}$ describes a "sure outcome" — it is as if we already know the result of the experiment is $\omega_0$ , so events that contain $\omega_0$ are certain and all others are impossible.

Example (lect1 p.3). Roll a die ( $\Omega = \{1, 2, 3, 4, 5, 6\}$ ) but you are somehow told that the outcome is $4$ . Consider $A = \{2, 4, 6\}$ (even outcomes). Then $\delta_4(A) = 1$ because $4 \in A$ . $\delta_4(\{1\}) = \delta_4(\{3\}) = \delta_4(\{5\}) = 0$ and $\delta_4(\{2\}) = \delta_4(\{4\}) = \delta_4(\{6\}) = 1$ (each singleton outcome has probability 0 or 1).

2.6 Simple probability

A probability $\mathbb{P} : 2^\Omega \to [0, 1]$ is called simple if there exists a finite event $E \subseteq \Omega$ with $\mathbb{P}(E) = 1$ . Intuitively: a simple probability has only finitely many "realistically possible" outcomes, even if $\Omega$ itself is infinite.

The support of a simple probability is the set of outcomes with strictly positive probability: $\operatorname{supp} \mathbb{P} \;=\; \{\omega \in \Omega : \mathbb{P}(\{\omega\}) > 0\}.$

Every simple probability can be written (Theorem 2002 — stated below) as a convex linear combination of Dirac probabilities: $\mathbb{P}(A) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_{\omega}(A) \qquad \forall\, A \subseteq \Omega.$

2.7 Countable additivity (σ-additivity)

(Marinacci definition 2003.) A probability $\mathbb{P} : 2^\Omega \to [0, 1]$ is countably additive (also called σ-additive) if for every countable collection $\{E_n\}_{n=1}^{+\infty}$ of pairwise disjoint events (i.e., $E_i \cap E_j = \emptyset$ for all $i \neq j$ ), $\boxed{\;\mathbb{P}\!\left(\bigcup_{n=1}^{+\infty} E_n\right) \;=\; \sum_{n=1}^{+\infty} \mathbb{P}(E_n).\;}$

This is the countable version of the finite-additivity axiom (K3). It is not automatically implied by K1–K3 and must be checked case-by-case. (Counterexample: the uniform probability on $\Omega = \mathbb{N}$ is a set function that satisfies the first three Kolmogorov axioms in spirit, but it is not countably additive — see Proposition 2009 below.)

Equivalent monotone-sequence formulation (lect2): $\mathbb{P}$ is countably additive $\iff$ for every increasing collection $A_n \uparrow A$ (i.e., $A_1 \subseteq A_2 \subseteq \cdots$ and $A = \bigcup A_n$ ), $\mathbb{P}(A) = \lim_{n\to\infty} \mathbb{P}(A_n)$ ; equivalently, for every decreasing collection $A_n \downarrow A$ , $\mathbb{P}(A) = \lim_{n\to\infty} \mathbb{P}(A_n)$ .

2.8 Conditional probability

Let $\mathbb{P}$ be a probability on $\Omega$ and let $B \subseteq \Omega$ be an event with $\mathbb{P}(B) > 0$ . The conditional probability of $A$ given $B$ is $\mathbb{P}(A \mid B) \;=\; \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}.$ For fixed $B$ , the function $A \mapsto \mathbb{P}(A \mid B)$ is itself a probability on $\Omega$ (with support concentrated on $B$ ). Intuition: once you know $B$ has happened, rescale probabilities to make $B$ the new "certain" event.

2.9 Independence

Independent events. Two events $A, B \subseteq \Omega$ are independent (under $\mathbb{P}$ ) if $\mathbb{P}(A \cap B) \;=\; \mathbb{P}(A) \cdot \mathbb{P}(B).$ Equivalently (when $\mathbb{P}(B) > 0$ ), $\mathbb{P}(A \mid B) = \mathbb{P}(A)$ — knowing $B$ occurred does not change the probability of $A$ .

Independent random variables. Random variables $f, g : \Omega \to \mathbb{R}$ are independent if the events $\{f \le x\}$ and $\{g \le y\}$ are independent for every $x, y \in \mathbb{R}$ : $\mathbb{P}(f \le x, g \le y) \;=\; \mathbb{P}(f \le x) \cdot \mathbb{P}(g \le y).$ Independence of $f$ and $g$ implies $E[f g] = E[f] \cdot E[g]$ , hence $\operatorname{Cov}(f, g) = 0$ . Caveat: the converse is false in general (see §7 — "zero correlation does not imply independence").

2.10 Random variable

A random variable is any function $f : \Omega \to \mathbb{R}$ . (Sicconi's convention uses $f$ ; Marinacci sometimes uses $X$ . In this guide we mix the two: lowercase $f, g, h$ for random variables as in the lectures, and uppercase $X, Y, Z$ when following the textbook verbatim — they are interchangeable.)

Intuition (lect6 p.1). A random variable is a bet that assigns a real-valued payoff to each outcome of the experiment. Example: roll a die and bet 10 euros on an even outcome, lose 10 euros on an odd outcome: $f(\omega) = \begin{cases} 10 & \omega \in \{2, 4, 6\} \\ -10 & \omega \in \{1, 3, 5\} \end{cases}.$

2.11 Probability mass function (PMF), probability density function (PDF), cumulative distribution function (CDF)

Let $f$ be a random variable on $(\Omega, \mathbb{P})$ .

CDF (distribution function) $\Phi : \mathbb{R} \to [0, 1]$ : $\Phi(x) \;=\; \mathbb{P}(\{\omega \in \Omega : f(\omega) \le x\}) \;=\; \mathbb{P}(f \le x) \qquad \forall\, x \in \mathbb{R}.$

Simple (discrete) density function $\varphi : \mathbb{R} \to [0, 1]$ , when $f$ takes finitely many distinct values $y_1, \ldots, y_n$ : $\varphi(y_i) \;=\; \mathbb{P}(f = y_i), \qquad \varphi(x) = 0 \text{ otherwise.}$ This is Marinacci's simple/finite density function (lect6 p.7); in standard terminology it is the probability mass function (PMF). One always has $\sum_{i=1}^n \varphi(y_i) = 1$ and $\Phi(x) = \sum_{y_i \le x} \varphi(y_i)$ .

Integrable (continuous) density function $\varphi : \mathbb{R} \to [0, +\infty)$ : $\Phi(x) \;=\; \int_{-\infty}^{x} \varphi(t)\,dt, \qquad \int_{-\infty}^{+\infty} \varphi(t)\,dt = 1.$ This is the PDF. For continuous $\Phi$ with a continuous density $\varphi$ , $\Phi'(x) = \varphi(x)$ (by Barrow–Torricelli; this is Marinacci's prop 2042).

Carrier $[a, b]$ of $\Phi$ : an interval such that $\Phi(x) = 0$ for all $x \le a$ and $\Phi(x) = 1$ for all $x \ge b$ . Equivalently, $\varphi(x) = 0$ outside $[a, b]$ . A random variable that admits a carrier is called essentially bounded.

2.12 Expected value $E_{\mathbb{P}}(f)$

Discrete (simple) case (Sicconi lect6 p.8): $E_{\mathbb{P}}(f) \;=\; \sum_{\omega \in \operatorname{supp} \mathbb{P}} f(\omega) \cdot \mathbb{P}(\{\omega\}) \;=\; \sum_{i=1}^n y_i \cdot \varphi(y_i),$ where the last sum is over the distinct values $y_i \in \operatorname{Im} f$ .

Stieltjes-integral formulation (Marinacci prop 2043): if $[a,b]$ is a carrier for $\Phi$ , $E_{\mathbb{P}}(f) \;=\; \int_a^b x\, d\Phi(x).$ When $\Phi$ has a continuous density $\varphi$ , this simplifies to $E_{\mathbb{P}}(f) = \int_a^b x\, \varphi(x)\, dx$ (via $d\Phi = \Phi'(x)\,dx = \varphi(x)\,dx$ ).

2.13 Variance, standard deviation, covariance, correlation

Let $f, g$ be random variables on $(\Omega, \mathbb{P})$ with finite expected values.

Variance: $V_{\mathbb{P}}(f) \;=\; E_{\mathbb{P}}\!\left[(f - E_{\mathbb{P}}(f))^2\right] \;=\; \sum_{\omega \in \operatorname{supp} \mathbb{P}} \bigl(f(\omega) - E_{\mathbb{P}}(f)\bigr)^2 \cdot \mathbb{P}(\{\omega\}) \;\ge\; 0.$ Computation formula (Theorem 2024): $V_{\mathbb{P}}(f) = E_{\mathbb{P}}(f^2) - [E_{\mathbb{P}}(f)]^2$ .

Standard deviation: $\sigma_{\mathbb{P}}(f) = \sqrt{V_{\mathbb{P}}(f)} \ge 0$ .

Covariance: $\operatorname{Cov}_{\mathbb{P}}(f, g) \;=\; E_{\mathbb{P}}\!\left[(f - E_{\mathbb{P}}(f))(g - E_{\mathbb{P}}(g))\right].$ Computation formula (Theorem 2026): $\operatorname{Cov}_{\mathbb{P}}(f, g) = E_{\mathbb{P}}(f g) - E_{\mathbb{P}}(f) E_{\mathbb{P}}(g)$ . Note $\operatorname{Cov}(f, f) = V(f)$ .

Linear correlation coefficient (defined when $\sigma(f), \sigma(g) > 0$ ): $\rho_{\mathbb{P}}(f, g) \;=\; \frac{\operatorname{Cov}_{\mathbb{P}}(f, g)}{\sigma_{\mathbb{P}}(f) \cdot \sigma_{\mathbb{P}}(g)}.$ The Cauchy–Schwarz bound (Theorem 2028, §3.13 below) gives $|\operatorname{Cov}(f, g)| \le \sigma(f) \sigma(g)$ , i.e., $|\rho(f, g)| \le 1$ .

2.14 A set function that is not a measure (lect1 Q1)

Let $\Omega = \{T_1, \ldots, T_{50}\}$ be 50 stocks, and $\phi(T_i)$ the difference between opening and closing price of stock $T_i$ . Define $\mu(A) = \sum_{T_i \in A} \phi(T_i) \qquad \forall\, A \subseteq \Omega.$ Then $\mu$ is a set function (assigns a real number to every event) and satisfies $\mu(\emptyset) = 0$ and additivity — but $\mu(A)$ can be negative (some stocks fell), so axiom K1 (positivity) fails. $\mu$ is a set function but not a measure and hence not a probability. This distinguishes the three levels of the hierarchy.

§3. Theorems, Propositions & Proofs

Each entry lists the theorem name, its Marinacci number (where known), the source, a clean statement, and a full proof (verbatim from Probability Proofs.pdf where available).

3.1 Monotonicity property of a measure (page 1987)

Statement. Let $M : 2^\Omega \to [0, +\infty)$ be a measure. Then $M$ is monotone: $M(A) \le M(B) \qquad \forall\, A, B \subseteq \Omega \text{ such that } A \subseteq B.$

Source: Probability Proofs.pdf p.1987 (handwritten).

Proof. Consider $A, B \subseteq \Omega$ with $A \subseteq B$ , and define $C = B \setminus A$ . Then $A \cup C = B, \qquad A \cap C = \emptyset.$ That is, $A$ and the "leftover" $C$ partition $B$ . Since every measure is finitely additive and positive, $M(B) = M(A \cup C) = M(A) + M(C) \;\ge\; M(A),$ using $M(C) \ge 0$ . $\blacksquare$

3.2 Relation between $M(A \cup B)$ and $M(A \cap B)$ (page 1989)

Statement. For every measure $M : 2^\Omega \to [0, +\infty)$ and all $A, B \subseteq \Omega$ , $M(A \cup B) + M(A \cap B) \;=\; M(A) + M(B).$

Source: Probability Proofs.pdf p.1989.

Proof. Split $A$ and $B$ using their intersection and relative complements: $A = (A \setminus B) \cup (A \cap B), \qquad (A \setminus B) \cap (A \cap B) = \emptyset,$ $B = (B \setminus A) \cup (A \cap B), \qquad (B \setminus A) \cap (A \cap B) = \emptyset.$ Since $M$ is (finitely) additive, $M(A) = M(A \setminus B) + M(A \cap B), \qquad M(B) = M(B \setminus A) + M(A \cap B).$ Moreover $A \cup B = (A \setminus B) \cup (A \cap B) \cup (B \setminus A)$ with the three pieces pairwise disjoint, so by finite additivity $M(A \cup B) = M(A \setminus B) + M(A \cap B) + M(B \setminus A).$ Adding $M(A \cap B)$ to both sides: $M(A \cup B) + M(A \cap B) = M(A \setminus B) + M(A \cap B) + M(B \setminus A) + M(A \cap B) = M(A) + M(B). \qquad \blacksquare$

Corollary (inclusion–exclusion for a probability). For $\mathbb{P} : 2^\Omega \to [0, 1]$ and all $A, B \subseteq \Omega$ , $\mathbb{P}(A \cup B) \;=\; \mathbb{P}(A) + \mathbb{P}(B) - \mathbb{P}(A \cap B).$

3.3 Probability of the complement (page 1883)

Statement. Let $\mathbb{P} : 2^\Omega \to [0, 1]$ be a probability. Then for every $A \subseteq \Omega$ , $\mathbb{P}(A^c) \;=\; 1 - \mathbb{P}(A).$

Source: Probability Proofs.pdf p.1883.

Proof. Since $A \cap A^c = \emptyset$ , by the additivity property (K3), $\mathbb{P}(A \cup A^c) = \mathbb{P}(A) + \mathbb{P}(A^c).$ But $A \cup A^c = \Omega$ and $\mathbb{P}(\Omega) = 1$ (K2, normalisation), so $1 = \mathbb{P}(A) + \mathbb{P}(A^c) \qquad \Longrightarrow \qquad \mathbb{P}(A^c) = 1 - \mathbb{P}(A). \qquad \blacksquare$

Consequence. $\mathbb{P}(\emptyset) = \mathbb{P}(\Omega^c) = 1 - \mathbb{P}(\Omega) = 1 - 1 = 0$ .

Union bound (Boole's inequality). For any $A, B$ , $\mathbb{P}(A \cup B) \le \mathbb{P}(A) + \mathbb{P}(B)$ (drop the non-negative $\mathbb{P}(A \cap B)$ from §3.2). More generally $\mathbb{P}(\bigcup_i A_i) \le \sum_i \mathbb{P}(A_i)$ .

3.4 Property of a simple probability #1 (page 1999)

Statement. Let $\mathbb{P} : 2^\Omega \to [0, 1]$ be a simple probability. If $E \subseteq \Omega$ is a finite event with $\mathbb{P}(E) = 1$ , then for every $\omega \notin E$ we have $\mathbb{P}(\{\omega\}) = 0$ .

Source: Probability Proofs.pdf p.1999.

Proof. Let $\omega \notin E$ ; then $\omega \in E^c$ . By §3.3, $\mathbb{P}(E^c) = 1 - \mathbb{P}(E) = 1 - 1 = 0$ . By monotonicity (§3.1) applied to $\{\omega\} \subseteq E^c$ , $0 \le \mathbb{P}(\{\omega\}) \le \mathbb{P}(E^c) = 0,$ forcing $\mathbb{P}(\{\omega\}) = 0$ . $\blacksquare$

3.5 Property of a simple probability #2 (page 2000)

Statement. Let $\mathbb{P} : 2^\Omega \to [0, 1]$ be a simple probability. Then: (1) $\operatorname{supp} \mathbb{P}$ is a finite event with $\mathbb{P}(\operatorname{supp} \mathbb{P}) = 1$ . (2) For every $A \subseteq \Omega$ that is a finite event with $\mathbb{P}(A) = 1$ , $\operatorname{supp} \mathbb{P} \subseteq A$ .

Source: Probability Proofs.pdf p.2000.

Proof of (1). Since $\mathbb{P}$ is simple, there exists a finite event $E$ with $\mathbb{P}(E) = 1$ . By §3.4, $\mathbb{P}(\{\omega\}) = 0$ for every $\omega \notin E$ . Hence every state with $\mathbb{P}(\{\omega\}) > 0$ lies in $E$ , i.e., $\operatorname{supp} \mathbb{P} \subseteq E$ . Since $E$ is finite, so is $\operatorname{supp} \mathbb{P}$ .

Consider the disjoint decomposition $E = \operatorname{supp} \mathbb{P} \cup (E \setminus \operatorname{supp} \mathbb{P})$ . By additivity, $\mathbb{P}(E) = \mathbb{P}(\operatorname{supp} \mathbb{P}) + \mathbb{P}(E \setminus \operatorname{supp} \mathbb{P}).$ For every $\omega \in E \setminus \operatorname{supp} \mathbb{P}$ , $\omega \notin \operatorname{supp} \mathbb{P}$ so $\mathbb{P}(\{\omega\}) = 0$ ; by finite additivity over the finite set $E \setminus \operatorname{supp} \mathbb{P}$ , $\mathbb{P}(E \setminus \operatorname{supp} \mathbb{P}) = 0$ . Hence $\mathbb{P}(\operatorname{supp} \mathbb{P}) = \mathbb{P}(E) = 1.$

Proof of (2) — by contradiction. Suppose there exists $\omega \in \operatorname{supp} \mathbb{P}$ with $\omega \notin A$ . By definition of the support, $\mathbb{P}(\{\omega\}) > 0$ . Set $B = A \cup \{\omega\}$ ; then $A \cap \{\omega\} = \emptyset$ , so by additivity $\mathbb{P}(B) = \mathbb{P}(A) + \mathbb{P}(\{\omega\}) = 1 + \mathbb{P}(\{\omega\}) > 1.$ This contradicts $\mathbb{P}(B) \le 1$ . Hence $\operatorname{supp} \mathbb{P} \subseteq A$ . $\blacksquare$

3.6 Property of a simple probability #3 (page 2001)

Statement. Let $\mathbb{P} : 2^\Omega \to [0, 1]$ be a simple probability. Then for every $A \subseteq \Omega$ , $\mathbb{P}(A) \;=\; \sum_{\omega \in A \cap \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}).$

Source: Probability Proofs.pdf p.2001.

Proof. For any $A \subseteq \Omega$ , decompose $A = (A \cap \operatorname{supp} \mathbb{P}) \cup (A \cap (\operatorname{supp} \mathbb{P})^c), \qquad (A \cap \operatorname{supp} \mathbb{P}) \cap (A \cap (\operatorname{supp} \mathbb{P})^c) = \emptyset.$ By additivity, $\mathbb{P}(A) = \mathbb{P}(A \cap \operatorname{supp} \mathbb{P}) + \mathbb{P}(A \cap (\operatorname{supp} \mathbb{P})^c).$ Since $A \cap (\operatorname{supp} \mathbb{P})^c \subseteq (\operatorname{supp} \mathbb{P})^c$ and $\mathbb{P}((\operatorname{supp} \mathbb{P})^c) = 1 - 1 = 0$ , by monotonicity $\mathbb{P}(A \cap (\operatorname{supp} \mathbb{P})^c) = 0$ . Hence $\mathbb{P}(A) = \mathbb{P}(A \cap \operatorname{supp} \mathbb{P}) = \mathbb{P}\!\left(\bigcup_{\omega \in A \cap \operatorname{supp} \mathbb{P}} \{\omega\}\right) = \sum_{\omega \in A \cap \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}),$ using finite additivity on the finite union. $\blacksquare$

3.7 Theorem 2002 — Simple probabilities are convex linear combinations of Dirac probabilities

Statement. Let $\mathbb{P} : 2^\Omega \to [0, 1]$ be a simple probability. Then for every $A \subseteq \Omega$ , $\mathbb{P}(A) \;=\; \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_{\omega}(A).$

Source: Probability Proofs.pdf p.2002.

Proof. By §3.6, $\mathbb{P}(A) = \sum_{\omega \in A \cap \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\})$ . Rewrite the indicator via the Dirac probability: $\sum_{\omega \in A \cap \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \mathbf{1}_{[\omega \in A]} = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_{\omega}(A).$ The indicator $\mathbf{1}_{[\omega \in A]}$ is exactly $\delta_\omega(A)$ : it equals $1$ if $\omega \in A$ and $0$ otherwise. $\blacksquare$

Why this matters. The theorem says every simple probability $\mathbb{P}$ is a convex combination of point-masses: the weights $\mathbb{P}(\{\omega\})$ sum to 1 and are non-negative, and the "atoms" $\delta_\omega$ are probabilities. This is the structural fact behind proposition 2006 below (to prove $\mathbb{P}$ is countably additive, it suffices to prove each $\delta_\omega$ is).

3.8 Proposition 2006 — Every Dirac probability is countably additive ★ May 2024 Q10c ★

Statement. Let $\omega_0 \in \Omega$ and $\delta_{\omega_0} : 2^\Omega \to [0, 1]$ be the Dirac probability concentrated at $\omega_0$ . Then $\delta_{\omega_0}$ is countably additive.

Source: Probability Proofs.pdf p.2006 (proposition 2006, "first part of the proof" as cited in May 2024 Q10).

Proof — by the monotone-sequence characterisation of countable additivity. We use the equivalence stated in §2.7: $\mathbb{P}$ is countably additive iff for every decreasing sequence $A_n \downarrow A$ (i.e., $A_1 \supseteq A_2 \supseteq \cdots$ and $A = \bigcap_{n=1}^{+\infty} A_n$ ), $\lim_{n\to\infty} \mathbb{P}(A_n) = \mathbb{P}(A)$ .

Consider an arbitrary decreasing collection $\{A_n\}_{n=1}^{+\infty}$ of events with $A_n \downarrow A = \bigcap_{n=1}^{+\infty} A_n$ . We split into two cases on whether $\omega_0$ belongs to the limit set $A$ .

Case (I): $\omega_0 \in A = \bigcap_{n=1}^{+\infty} A_n$ . Then $\omega_0 \in A_n$ for every $n \ge 1$ , hence $\delta_{\omega_0}(A_n) = 1$ for every $n$ . Also $\delta_{\omega_0}(A) = 1$ (since $\omega_0 \in A$ ). Therefore $\lim_{n \to +\infty} \delta_{\omega_0}(A_n) = \lim_{n \to +\infty} 1 = 1 = \delta_{\omega_0}(A). \qquad \checkmark$

Case (II): $\omega_0 \notin A = \bigcap_{n=1}^{+\infty} A_n$ . Since $\omega_0$ is not in the intersection, $\omega_0 \notin A_{\bar n}$ for some $\bar n \in \mathbb{N}$ . Because the sequence is decreasing ( $A_{\bar n} \supseteq A_{\bar n + 1} \supseteq \cdots$ ), for every $n \ge \bar n$ , $A_n \subseteq A_{\bar n}$ , so $\omega_0 \notin A_n$ . Hence $\delta_{\omega_0}(A_n) = 0$ for all $n \ge \bar n$ . Also $\delta_{\omega_0}(A) = 0$ (since $\omega_0 \notin A$ ). Therefore $\lim_{n \to +\infty} \delta_{\omega_0}(A_n) = \lim_{n \to +\infty} 0 = 0 = \delta_{\omega_0}(A). \qquad \checkmark$

Since $\lim_{n\to\infty} \delta_{\omega_0}(A_n) = \delta_{\omega_0}(A)$ in both cases, by the monotone-sequence criterion $\delta_{\omega_0}$ is countably additive. $\blacksquare$

Remark (extension to simple probabilities — "part 2" of the proof). Let $\mathbb{P}$ be a simple probability. By Theorem 2002 (§3.7), $\mathbb{P}(A) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_\omega(A)$ . For any decreasing collection $A_n \downarrow A$ , $\lim_{n\to\infty} \mathbb{P}(A_n) = \lim_{n\to\infty} \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_\omega(A_n) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \lim_{n\to\infty}\delta_\omega(A_n) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_\omega(A) = \mathbb{P}(A),$ where the swap of $\lim$ and $\sum$ is legal because $\operatorname{supp} \mathbb{P}$ is finite. Hence every simple probability is countably additive. (May 2024 Q10(c) accepts the shorter "first part" — just the Dirac case.)

3.9 Proposition 2009 — The uniform probability on $\mathbb{N}$ is not countably additive

Statement. There is no countably additive probability $\mathbb{P}$ on $\Omega = \mathbb{N}$ for which every singleton has the same probability (a "uniform probability on $\mathbb{N}$ ").

Source: Probability Proofs.pdf p.2009.

Proof — by contradiction. Suppose such a uniform countably additive $\mathbb{P}$ exists, and set $\mathbb{P}(\{n\}) = k$ for every $n \in \mathbb{N}$ , with $k \ge 0$ . Since $\mathbb{N} = \bigcup_{n \in \mathbb{N}} \{n\}$ is a disjoint union and $\mathbb{P}$ is countably additive, $1 = \mathbb{P}(\mathbb{N}) = \mathbb{P}\!\left(\bigcup_{n \in \mathbb{N}} \{n\}\right) = \sum_{n=1}^{+\infty} \mathbb{P}(\{n\}) = \sum_{n=1}^{+\infty} k = \begin{cases} 0 & k = 0 \\ +\infty & k > 0.\end{cases}$ In both cases we reach a contradiction ( $1 = 0$ or $1 = +\infty$ ). Therefore no countably additive uniform probability on $\mathbb{N}$ exists. $\blacksquare$

Consequence. Simple probability ↔ countable additivity is not equivalent. Poisson and Geometric distributions (defined on $\Omega = \mathbb{N}$ ) are not simple but are countably additive (lect2).

3.10 Page 2014 — Random variables equal $\mathbb{P}$ -a.e.: characterization

Two random variables $f, g : \Omega \to \mathbb{R}$ are equal $\mathbb{P}$ -almost everywhere (equal $\mathbb{P}$ -a.e.) if $\mathbb{P}(\{\omega \in \Omega : f(\omega) = g(\omega)\}) = 1$ .

Statement. Let $\mathbb{P}$ be a simple probability. Then $f, g$ are equal $\mathbb{P}$ -a.e. if and only if $f(\omega) = g(\omega)$ for every $\omega \in \operatorname{supp} \mathbb{P}$ .

Source: Probability Proofs.pdf p.2014.

Proof. ( $\Leftarrow$ ) Suppose $f(\omega) = g(\omega)$ for every $\omega \in \operatorname{supp} \mathbb{P}$ . Then $\operatorname{supp} \mathbb{P} \subseteq \{\omega : f(\omega) = g(\omega)\}$ . Since $\mathbb{P}(\operatorname{supp} \mathbb{P}) = 1$ , by monotonicity (§3.1) $\mathbb{P}(\{\omega : f(\omega) = g(\omega)\}) = 1$ .

( $\Rightarrow$ ) Suppose $\mathbb{P}(\{\omega : f(\omega) = g(\omega)\}) = 1$ . By §3.5(2) applied with $A = \{\omega : f(\omega) = g(\omega)\}$ , $\operatorname{supp} \mathbb{P} \subseteq A$ , i.e., $f(\omega) = g(\omega)$ for every $\omega \in \operatorname{supp} \mathbb{P}$ . $\blacksquare$

3.11 Page 2017 — Random variables equal $\mathbb{P}$ -a.e. have the same expected value

Statement. Let $\mathbb{P}$ be a simple probability and $f, g : \Omega \to \mathbb{R}$ random variables equal $\mathbb{P}$ -a.e. Then $E_{\mathbb{P}}(f) = E_{\mathbb{P}}(g)$ .

Source: Probability Proofs.pdf p.2017.

Proof. By definition $E_{\mathbb{P}}(f) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} f(\omega) \cdot \mathbb{P}(\{\omega\})$ and similarly for $g$ . By §3.10, $f(\omega) = g(\omega)$ for every $\omega \in \operatorname{supp} \mathbb{P}$ , so the two sums coincide termwise: $E_{\mathbb{P}}(f) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} f(\omega)\cdot \mathbb{P}(\{\omega\}) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} g(\omega)\cdot \mathbb{P}(\{\omega\}) = E_{\mathbb{P}}(g). \qquad \blacksquare$

Caveat (lect6 p.9). The converse is false: $E_\mathbb{P}(f) = E_\mathbb{P}(g)$ does not imply $f = g$ $\mathbb{P}$ -a.e. Example: on a fair die, $f(\omega) = +10$ for even, $-10$ for odd; $g(\omega) = -10$ for even, $+10$ for odd. Both have expected value 0 but they disagree on every outcome.

3.12 Page 2018 — Expected value properties (linearity, monotonicity, extension)

Statement. Let $\mathbb{P}$ be a simple probability and $f, g : \Omega \to \mathbb{R}$ random variables. (1) Linearity. For all $\alpha, \beta \in \mathbb{R}$ , $E_{\mathbb{P}}(\alpha f + \beta g) = \alpha E_{\mathbb{P}}(f) + \beta E_{\mathbb{P}}(g)$ . (2) Monotonicity. If $f(\omega) \ge g(\omega)$ for every $\omega \in \Omega$ , then $E_{\mathbb{P}}(f) \ge E_{\mathbb{P}}(g)$ . (3) Extension to finite sets. For every finite $A \subseteq \Omega$ with $A \supseteq \operatorname{supp} \mathbb{P}$ , $E_{\mathbb{P}}(f) = \sum_{\omega \in A} f(\omega) \cdot \mathbb{P}(\{\omega\}).$

Source: Probability Proofs.pdf p.2018.

Proof of (1) — Linearity. $E_{\mathbb{P}}(\alpha f + \beta g) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} (\alpha f(\omega) + \beta g(\omega)) \cdot \mathbb{P}(\{\omega\}) = \alpha \sum_{\omega} f(\omega) \mathbb{P}(\{\omega\}) + \beta \sum_{\omega} g(\omega) \mathbb{P}(\{\omega\}) = \alpha E_{\mathbb{P}}(f) + \beta E_{\mathbb{P}}(g).$

Proof of (2) — Monotonicity. Assume $f(\omega) \ge g(\omega)$ for every $\omega$ . Multiplying by $\mathbb{P}(\{\omega\}) \ge 0$ preserves the inequality: $f(\omega)\mathbb{P}(\{\omega\}) \ge g(\omega)\mathbb{P}(\{\omega\})$ . Summing over $\omega \in \operatorname{supp} \mathbb{P}$ gives $E_{\mathbb{P}}(f) \ge E_{\mathbb{P}}(g)$ .

Proof of (3) — Extension. States $\omega \in \Omega$ with $\omega \notin \operatorname{supp} \mathbb{P}$ have $\mathbb{P}(\{\omega\}) = 0$ , so their contribution to the sum is zero: $\sum_{\omega \in A} f(\omega) \mathbb{P}(\{\omega\}) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} f(\omega) \mathbb{P}(\{\omega\}) + \underbrace{\sum_{\omega \in A \setminus \operatorname{supp} \mathbb{P}} f(\omega) \cdot 0}_{= 0} = E_{\mathbb{P}}(f). \qquad \blacksquare$

Corollary — Expected value of an affine function. $E_{\mathbb{P}}(\alpha f + \beta) = \alpha E_{\mathbb{P}}(f) + \beta$ for all $\alpha, \beta \in \mathbb{R}$ . Proof: apply (1) with $g \equiv 1$ (note $E_{\mathbb{P}}(1) = \sum_\omega \mathbb{P}(\{\omega\}) = 1$ ).

3.13 Page 2024 — Variance properties (computation formula, affine functions)

Statement. Let $\mathbb{P}$ be a simple probability and $f : \Omega \to \mathbb{R}$ a random variable. For all $\alpha, \beta \in \mathbb{R}$ : (1) Computation formula. $V_{\mathbb{P}}(f) = E_{\mathbb{P}}(f^2) - [E_{\mathbb{P}}(f)]^2$ . (2) Affine functions. $V_{\mathbb{P}}(\alpha f + \beta) = \alpha^2 V_{\mathbb{P}}(f)$ .

Source: Probability Proofs.pdf p.2024.

Proof of (1). Expand $(f - E_{\mathbb{P}}(f))^2 = f^2 - 2 f \, E_{\mathbb{P}}(f) + [E_{\mathbb{P}}(f)]^2$ . By linearity of expectation (§3.12.1), $V_{\mathbb{P}}(f) = E_{\mathbb{P}}\bigl[(f - E_{\mathbb{P}}(f))^2\bigr] = E_{\mathbb{P}}(f^2) - 2 E_{\mathbb{P}}(f) \cdot E_{\mathbb{P}}(f) + [E_{\mathbb{P}}(f)]^2 = E_{\mathbb{P}}(f^2) - [E_{\mathbb{P}}(f)]^2.$ (We used $E_{\mathbb{P}}(E_{\mathbb{P}}(f)) = E_{\mathbb{P}}(f)$ because $E_{\mathbb{P}}(f) \in \mathbb{R}$ is a constant.)

Proof of (2). By definition and linearity (§3.12), $V_{\mathbb{P}}(\alpha f + \beta) = E_{\mathbb{P}}\bigl[(\alpha f + \beta - E_{\mathbb{P}}(\alpha f + \beta))^2\bigr] = E_{\mathbb{P}}\bigl[(\alpha f + \beta - \alpha E_{\mathbb{P}}(f) - \beta)^2\bigr] = E_{\mathbb{P}}\bigl[\alpha^2 (f - E_{\mathbb{P}}(f))^2\bigr] = \alpha^2 V_{\mathbb{P}}(f). \qquad \blacksquare$

Observation. The constant $\beta$ disappears because shifting a random variable by a constant does not change its spread; the coefficient $\alpha$ is squared because variance has units of $f^2$ .

3.14 Page 2026 — Covariance properties (computation formula, bilinearity) ★ May 2024 MCQ5 ★

Statement. Let $\mathbb{P}$ be a simple probability and $f, g : \Omega \to \mathbb{R}$ random variables. For all $\alpha, \beta, \gamma, \delta \in \mathbb{R}$ : (1) Computation formula. $\operatorname{Cov}_{\mathbb{P}}(f, g) = E_{\mathbb{P}}(f g) - E_{\mathbb{P}}(f) \, E_{\mathbb{P}}(g)$ . (2) Bilinearity for affine functions. $\operatorname{Cov}_{\mathbb{P}}(\alpha f + \beta, \gamma g + \delta) = \alpha \gamma \, \operatorname{Cov}_{\mathbb{P}}(f, g)$ .

Source: Probability Proofs.pdf p.2026.

Proof of (1). Expand $(f - E_{\mathbb{P}}(f))(g - E_{\mathbb{P}}(g)) = f g - f \, E_{\mathbb{P}}(g) - g \, E_{\mathbb{P}}(f) + E_{\mathbb{P}}(f) \, E_{\mathbb{P}}(g).$ Apply expectation and use linearity (§3.12.1): $\operatorname{Cov}_{\mathbb{P}}(f, g) = E_{\mathbb{P}}(f g) - E_{\mathbb{P}}(f) \, E_{\mathbb{P}}(g) - E_{\mathbb{P}}(g) \, E_{\mathbb{P}}(f) + E_{\mathbb{P}}(f) E_{\mathbb{P}}(g) = E_{\mathbb{P}}(f g) - E_{\mathbb{P}}(f) \, E_{\mathbb{P}}(g).$

Proof of (2). First note that for every $\omega$ , $\bigl(\alpha f(\omega) + \beta - E_{\mathbb{P}}(\alpha f + \beta)\bigr)\bigl(\gamma g(\omega) + \delta - E_{\mathbb{P}}(\gamma g + \delta)\bigr) = \bigl(\alpha f(\omega) - \alpha E_{\mathbb{P}}(f)\bigr)\bigl(\gamma g(\omega) - \gamma E_{\mathbb{P}}(g)\bigr) = \alpha \gamma (f(\omega) - E_{\mathbb{P}}(f))(g(\omega) - E_{\mathbb{P}}(g)).$ The additive constants $\beta, \delta$ cancel because $E_{\mathbb{P}}(\alpha f + \beta) = \alpha E_{\mathbb{P}}(f) + \beta$ . Taking expectation and factoring $\alpha\gamma$ out by linearity, $\operatorname{Cov}_{\mathbb{P}}(\alpha f + \beta, \gamma g + \delta) = \alpha \gamma \, E_{\mathbb{P}}\bigl[(f - E_{\mathbb{P}}(f))(g - E_{\mathbb{P}}(g))\bigr] = \alpha \gamma \, \operatorname{Cov}_{\mathbb{P}}(f, g). \qquad \blacksquare$

Special cases worth memorising.

$\operatorname{Cov}(f, f) = V(f)$ (set $g = f$ , $\alpha = \gamma = 1$ , $\beta = \delta = 0$ ).
$\operatorname{Cov}(\alpha f, \beta g) = \alpha \beta \operatorname{Cov}(f, g)$ (set $\beta = \delta = 0$ — this is the May 2024 MCQ5 identity).
Additive constants do not affect covariance.
Symmetry: $\operatorname{Cov}(f, g) = \operatorname{Cov}(g, f)$ (both equal $E(fg) - E(f)E(g)$ ).

3.15 Page 2027 — Variance of a sum of random variables

Statement. Let $\mathbb{P}$ be a simple probability and $f, g : \Omega \to \mathbb{R}$ random variables. Then $V_{\mathbb{P}}(f + g) \;=\; V_{\mathbb{P}}(f) + V_{\mathbb{P}}(g) + 2 \operatorname{Cov}_{\mathbb{P}}(f, g).$

Source: Probability Proofs.pdf p.2027.

Proof. By definition, $V_{\mathbb{P}}(f + g) = E_{\mathbb{P}}\bigl[(f + g - E_{\mathbb{P}}(f + g))^2\bigr] = E_{\mathbb{P}}\bigl[(f - E_{\mathbb{P}}(f) + g - E_{\mathbb{P}}(g))^2\bigr],$ using linearity of expectation $E_{\mathbb{P}}(f+g) = E_{\mathbb{P}}(f) + E_{\mathbb{P}}(g)$ . Expanding the square: $(f - E_{\mathbb{P}}(f) + g - E_{\mathbb{P}}(g))^2 = (f - E_{\mathbb{P}}(f))^2 + (g - E_{\mathbb{P}}(g))^2 + 2(f - E_{\mathbb{P}}(f))(g - E_{\mathbb{P}}(g)).$ Apply expectation and linearity: $V_{\mathbb{P}}(f + g) = V_{\mathbb{P}}(f) + V_{\mathbb{P}}(g) + 2 \operatorname{Cov}_{\mathbb{P}}(f, g). \qquad \blacksquare$

Corollary (independent $f, g$ ). If $f, g$ are independent, $\operatorname{Cov}(f, g) = 0$ , so $V(f + g) = V(f) + V(g)$ .

General formula for an affine combination (lect7): $V(\alpha f + \beta g + \text{const}) = \alpha^2 V(f) + \beta^2 V(g) + 2 \alpha \beta \operatorname{Cov}(f, g)$ .

3.16 Page 2028 — Covariance: boundedness property (Cauchy–Schwarz)

Statement. Let $\mathbb{P}$ be a simple probability and $f, g : \Omega \to \mathbb{R}$ random variables. Then $|\operatorname{Cov}_{\mathbb{P}}(f, g)| \;\le\; \sigma_{\mathbb{P}}(f) \cdot \sigma_{\mathbb{P}}(g).$

Source: Probability Proofs.pdf p.2028.

Proof. Write $\operatorname{supp} \mathbb{P} = \{\omega_1, \ldots, \omega_n\}$ .

Part (I) — centred case $E_{\mathbb{P}}(f) = E_{\mathbb{P}}(g) = 0$ . Define $x_i = f(\omega_i) \sqrt{\mathbb{P}(\{\omega_i\})}$ and $y_i = g(\omega_i) \sqrt{\mathbb{P}(\{\omega_i\})}$ for $i = 1, \ldots, n$ . These are just real numbers (elements of $\mathbb{R}^n$ ). $|\operatorname{Cov}_{\mathbb{P}}(f, g)| = \left|\sum_{i=1}^n f(\omega_i) g(\omega_i) \mathbb{P}(\{\omega_i\})\right| = \left|\sum_{i=1}^n x_i y_i\right| = |\mathbf{x} \cdot \mathbf{y}|.$ (We used the computation formula $\operatorname{Cov}(f,g) = E(fg) - E(f)E(g) = E(fg)$ since the centering kills the product.) By the Cauchy–Schwarz inequality in $\mathbb{R}^n$ , $|\mathbf{x} \cdot \mathbf{y}| \le \|\mathbf{x}\| \cdot \|\mathbf{y}\|,$ and $\|\mathbf{x}\| = \sqrt{\sum_{i=1}^n x_i^2} = \sqrt{\sum_{i=1}^n f^2(\omega_i) \mathbb{P}(\{\omega_i\})} = \sqrt{E_{\mathbb{P}}(f^2)} = \sqrt{V_{\mathbb{P}}(f)} = \sigma_{\mathbb{P}}(f),$ similarly $\|\mathbf{y}\| = \sigma_{\mathbb{P}}(g)$ . Thus $|\operatorname{Cov}_{\mathbb{P}}(f, g)| \le \sigma_{\mathbb{P}}(f) \sigma_{\mathbb{P}}(g)$ .

Part (II) — general case. Define the centred variables $\tilde f = f - E_{\mathbb{P}}(f)$ and $\tilde g = g - E_{\mathbb{P}}(g)$ . Then $E_{\mathbb{P}}(\tilde f) = E_{\mathbb{P}}(\tilde g) = 0$ , so Part (I) applies: $|\operatorname{Cov}_{\mathbb{P}}(\tilde f, \tilde g)| \le \sigma_{\mathbb{P}}(\tilde f) \cdot \sigma_{\mathbb{P}}(\tilde g).$ By the affine-function rules for variance and covariance (§3.13.2, §3.14.2): $\sigma_{\mathbb{P}}(\tilde f) = \sigma_{\mathbb{P}}(f - E_{\mathbb{P}}(f)) = \sigma_{\mathbb{P}}(f), \qquad \sigma_{\mathbb{P}}(\tilde g) = \sigma_{\mathbb{P}}(g), \qquad \operatorname{Cov}_{\mathbb{P}}(\tilde f, \tilde g) = \operatorname{Cov}_{\mathbb{P}}(f, g).$ Therefore $|\operatorname{Cov}_{\mathbb{P}}(f, g)| \le \sigma_{\mathbb{P}}(f) \sigma_{\mathbb{P}}(g)$ . $\blacksquare$

Consequence. $|\rho_{\mathbb{P}}(f, g)| = |\operatorname{Cov}_{\mathbb{P}}(f, g)| / (\sigma_{\mathbb{P}}(f) \sigma_{\mathbb{P}}(g)) \le 1$ , with equality iff $g$ is an affine function of $f$ (or vice versa).

3.17 Distribution function is increasing (page 2032)

Statement. Let $f : \Omega \to \mathbb{R}$ be a random variable and $\Phi(x) = \mathbb{P}(f \le x)$ its distribution function. Then $\Phi : \mathbb{R} \to [0, 1]$ is (weakly) increasing.

Source: Probability Proofs.pdf p.2032.

Proof. For $x \le y$ , $\{f \le x\} \subseteq \{f \le y\}$ . By monotonicity of $\mathbb{P}$ (§3.1 applied to the probability measure), $\Phi(x) = \mathbb{P}(f \le x) \le \mathbb{P}(f \le y) = \Phi(y). \qquad \blacksquare$

3.18 Distribution function for an essentially bounded random variable is eventually constant (page 2037)

Statement. If $f : \Omega \to \mathbb{R}$ is essentially bounded (i.e., there exist $m, M \in \mathbb{R}$ with $\mathbb{P}(m \le f \le M) = 1$ ), then there exist scalars $a, b$ such that $\Phi(x) = 0$ for $x \le a$ and $\Phi(x) = 1$ for $x \ge b$ .

Source: Probability Proofs.pdf p.2037.

Proof. Pick $m, M \in \mathbb{R}$ with $\mathbb{P}(m \le f \le M) = 1$ .

For $x < m$ : $\{f \le x\} \cap \{m \le f \le M\} = \emptyset$ , so by disjointness $\{f \le x\} \subseteq \{m \le f \le M\}^c$ , a set of probability $0$ . By monotonicity, $\Phi(x) = \mathbb{P}(f \le x) = 0$ .

For $x \ge M$ : $\{m \le f \le M\} \subseteq \{f \le x\}$ , so by monotonicity $\Phi(x) = \mathbb{P}(f \le x) \ge \mathbb{P}(m \le f \le M) = 1$ . Combined with $\Phi(x) \le 1$ , we get $\Phi(x) = 1$ .

Setting $a < m$ and $b = M$ yields the conclusion. $\blacksquare$

Terminology. Any interval $[a, b]$ such that $\Phi(x) = 0$ for $x \le a$ and $\Phi(x) = 1$ for $x \ge b$ is called a carrier of the distribution function.

3.19 Properties of the distribution function for a continuous density (pages 2041, 2042)

Statement (2041 — density vanishes outside the carrier). Let $\Phi(x) = \int_{-\infty}^x \varphi(t)\,dt$ be a distribution function with integrable density $\varphi$ and carrier $[a, b]$ . If $\varphi$ is continuous outside $[a, b]$ , then $\varphi(x) = 0$ for every $x \notin [a, b]$ .

Proof. Since $[a, b]$ is a carrier, $\int_{-\infty}^{+\infty} \varphi(x)\,dx = \int_a^b \varphi(x)\,dx = 1$ . Fix $z_1, z_2 > b$ with $z_1 < z_2$ ; by the definition of the carrier, $\int_{z_1}^{z_2} \varphi(x)\,dx = 0$ . Since $\varphi$ is continuous (and non-negative) on $[z_1, z_2]$ , the vanishing integral forces $\varphi(x) = 0$ on $[z_1, z_2]$ . This holds for all such $z_1, z_2$ , hence $\varphi(x) = 0$ on $(b, +\infty)$ . The symmetric argument on $(-\infty, a)$ completes the proof. $\blacksquare$

Statement (2042 — Barrow–Torricelli link). Let $\Phi$ be a distribution function with carrier $[a, b]$ . Then $\Phi$ has a unique continuous density $\varphi$ on $[a, b]$ iff $\Phi$ is continuously differentiable on $[a, b]$ , in which case $\Phi'(x) = \varphi(x)$ .

Proof. Apply the Barrow–Torricelli theorem (prop 2030 from Integral Calculus) on $[a, b]$ with $\Phi = g$ and $\varphi = \gamma$ . $\blacksquare$

(These technical lemmas rarely appear on their own on the exam, but they underlie the continuous distributions listed in §3.24.)

3.20 Expected value of a random variable w.r.t. a simple probability as a Stieltjes integral (page 2043)

Statement. Let $\mathbb{P}$ be a simple probability and $f : \Omega \to \mathbb{R}$ a random variable with distribution function $\Phi$ having carrier $[a, b]$ . Then $E_{\mathbb{P}}(f) \;=\; \int_a^b x\, d\Phi(x).$

Source: Probability Proofs.pdf p.2043.

Proof. Write $\operatorname{supp} \mathbb{P} = \{\omega_1, \ldots, \omega_n\}$ and set $x_i = f(\omega_i)$ . Assume (WLOG) the $x_i$ are distinct and ordered $x_1 < x_2 < \cdots < x_n$ . Since $[a, b]$ is a carrier, $a < x_1$ and $b \ge x_n$ . $E_{\mathbb{P}}(f) = \sum_{i=1}^n f(\omega_i) \mathbb{P}(\{\omega_i\}) = \sum_{i=1}^n x_i \, \mathbb{P}(f = x_i).$ By countable additivity of $\mathbb{P}$ , the jump size of $\Phi$ at $x_i$ is $\Phi(x_i) - \lim_{x \to x_i^-} \Phi(x) = \mathbb{P}(f = x_i)$ . Hence $E_{\mathbb{P}}(f) = \sum_{i=1}^n x_i \left[\Phi(x_i) - \lim_{x \to x_i^-} \Phi(x)\right].$ Since $\Phi$ is increasing and right-continuous on $[a, b]$ (§3.17 and standard CDF properties), by the theorem on the writing of the Stieltjes integral with step-function integrator, $E_{\mathbb{P}}(f) = \int_a^b x\, d\Phi(x). \qquad \blacksquare$

3.21 Bayes' theorem and the law of total probability

Bayes' theorem (stated in lect3). Let $A, B \subseteq \Omega$ with $\mathbb{P}(A), \mathbb{P}(B) > 0$ . Then $\mathbb{P}(A \mid B) \;=\; \frac{\mathbb{P}(B \mid A) \cdot \mathbb{P}(A)}{\mathbb{P}(B)}.$

Proof. By the definition of conditional probability, $\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)} \qquad \text{and} \qquad \mathbb{P}(B \mid A) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(A)}.$ Multiply the second equation by $\mathbb{P}(A)$ to get $\mathbb{P}(A \cap B) = \mathbb{P}(B \mid A) \mathbb{P}(A)$ , and substitute into the first. $\blacksquare$

Law of total probability. Let $\{E_i\}_{i=1}^n$ be a partition of $\Omega$ (pairwise disjoint, union equal to $\Omega$ ) with $\mathbb{P}(E_i) > 0$ for every $i$ . Then for every $A \subseteq \Omega$ , $\mathbb{P}(A) \;=\; \sum_{i=1}^n \mathbb{P}(A \mid E_i) \cdot \mathbb{P}(E_i).$

Proof. $A = A \cap \Omega = A \cap \bigcup_{i=1}^n E_i = \bigcup_{i=1}^n (A \cap E_i)$ , a disjoint union. By finite additivity, $\mathbb{P}(A) = \sum_{i=1}^n \mathbb{P}(A \cap E_i) = \sum_{i=1}^n \mathbb{P}(A \mid E_i) \mathbb{P}(E_i). \qquad \blacksquare$

Combined Bayes / total-probability formula. If $\{E_i\}$ is a partition and $\mathbb{P}(A) > 0$ , $\mathbb{P}(E_k \mid A) = \frac{\mathbb{P}(A \mid E_k) \mathbb{P}(E_k)}{\sum_{i=1}^n \mathbb{P}(A \mid E_i) \mathbb{P}(E_i)}.$

3.22 Markov's and Chebyshev's inequalities

Markov's inequality (lect5). For any random variable $X \ge 0$ and any $a > 0$ , $\mathbb{P}(X \ge a) \;\le\; \frac{E(X)}{a}.$

Proof. Since $X \ge 0$ , $X \ge a \cdot \mathbf{1}_{\{X \ge a\}}$ pointwise (if $X(\omega) \ge a$ , then $X(\omega) \ge a \cdot 1$ ; if $X(\omega) < a$ , the RHS is 0). Taking expectations (monotonicity, §3.12.2): $E(X) \ge a \cdot E(\mathbf{1}_{\{X \ge a\}}) = a \cdot \mathbb{P}(X \ge a). \qquad \blacksquare$

Chebyshev's inequality. Let $X$ be a random variable with finite mean $\mu = E(X)$ and variance $\sigma^2 = V(X)$ . For every $k > 0$ , $\mathbb{P}(|X - \mu| \ge k \sigma) \;\le\; \frac{1}{k^2}.$ Equivalently, $\mathbb{P}(|X - \mu| \ge a) \le \sigma^2 / a^2$ for every $a > 0$ .

Proof. Apply Markov's inequality to $Y = (X - \mu)^2 \ge 0$ with threshold $a = (k \sigma)^2$ : $\mathbb{P}(|X - \mu| \ge k\sigma) = \mathbb{P}(Y \ge (k\sigma)^2) \le \frac{E(Y)}{(k\sigma)^2} = \frac{V(X)}{k^2 \sigma^2} = \frac{1}{k^2}. \qquad \blacksquare$

Interpretation. Chebyshev says at least $1 - 1/k^2$ of the probability mass is within $k$ standard deviations of the mean. E.g., $k = 2$ gives $\mathbb{P}(|X - \mu| < 2\sigma) \ge 3/4$ .

3.23 Named discrete distributions (lect2, lect6)

For each distribution we list $\Omega$ , the PMF $p_n$ , the expected value $E(f)$ (where $f$ is the identity random variable, $f = n$ ), and the variance $V(f)$ .

| Distribution | $\Omega$ | PMF $p_n$ | $E(f)$ | $V(f)$ | Notes | |---|---|---|---|---|---| | Bernoulli( $p$ ) | $\{0, 1\}$ | $p_1 = p$ , $p_0 = 1-p$ | $p$ | $p(1-p)$ | Single 0/1 trial | | Binomial( $n, p$ ) | $\{0, 1, \ldots, n\}$ | $p_k = \binom{n}{k} p^k (1-p)^{n-k}$ | $n p$ | $n p (1-p)$ | Sum of $n$ i.i.d. Bernoulli( $p$ ) | | Geometric( $q$ ) | $\mathbb{N} = \{0, 1, 2, \ldots\}$ | $p_n = q^n (1-q)$ , $0 < q < 1$ | $q/(1-q)$ | $q/(1-q)^2$ | Countable; not simple, countably additive | | Poisson( $\lambda$ ) | $\mathbb{N}$ | $p_n = e^{-\lambda} \lambda^n / n!$ , $\lambda > 0$ | $\lambda$ | $\lambda$ | Countable; not simple, countably additive | | Uniform discrete( $n$ ) | $\{1, \ldots, n\}$ | $p_i = 1/n$ | $(n+1)/2$ | $(n+1)(n-1)/12$ | Fair die: $\Omega = \{1,\ldots,6\}$ , $E=3.5$ , $V=35/12$ |

Key identity (Poisson). $P(X = n) = e^{-\lambda} \lambda^n / n!$ ; in particular the probabilities sum to $\sum_{n=0}^{+\infty} e^{-\lambda} \lambda^n / n! = e^{-\lambda} \cdot e^\lambda = 1$ (using the series expansion of $e^\lambda$ ).

Worked Poisson (lect2 p.4): for $\lambda = 2$ , $P(X = 0) = e^{-2}, \qquad P(X = 1) = 2 e^{-2}, \qquad P(X = 2) = 2 e^{-2}, \qquad P(X = 3) = \frac{4}{3} e^{-2}, \ldots$ This distribution is used in the May 2024 MCQ4 (see §4.1).

3.24 Named continuous distributions (lect7)

For each distribution we list the PDF $\varphi(x)$ , the CDF $\Phi(x)$ , the carrier (if any), and the expected value $E(f)$ (for $f = \text{identity}$ ).

| Distribution | PDF $\varphi(x)$ | CDF $\Phi(x)$ | Carrier | $E(f)$ | $V(f)$ | |---|---|---|---|---|---| | Uniform $[a,b]$ | $\dfrac{1}{b-a}\cdot \mathbf{1}_{[a,b]}(x)$ | $\begin{cases}0 & x < a \\ \frac{x-a}{b-a} & a \le x \le b \\ 1 & x > b\end{cases}$ | $[a, b]$ | $(a+b)/2$ | $(b-a)^2/12$ | | (Negative) exponential( $\alpha$ ), $\alpha > 0$ | $\begin{cases}0 & x < 0 \\ \alpha e^{-\alpha x} & x \ge 0\end{cases}$ | $\begin{cases}0 & x < 0 \\ 1 - e^{-\alpha x} & x \ge 0\end{cases}$ | none (unbounded) | $1/\alpha$ | $1/\alpha^2$ | | Standard normal / Gaussian | $\dfrac{1}{\sqrt{2\pi}} e^{-x^2/2}$ | no closed form ( $\Phi(0) = 1/2$ ) | none (unbounded) | $0$ | $1$ |

Proof that $E(f) = 1/\alpha$ for exponential (lect7, reproduced from the source). $E_{\mathbb{P}}(f) = \int_{-\infty}^{+\infty} x \, d\Phi(x) = \int_0^{+\infty} x \cdot \alpha e^{-\alpha x} \, dx.$ Integration by parts with $u = \alpha x$ , $u' = \alpha$ , $v' = e^{-\alpha x}$ , $v = -\frac{1}{\alpha} e^{-\alpha x}$ : $\int \alpha x \cdot e^{-\alpha x} dx = -x e^{-\alpha x} + \int e^{-\alpha x} dx = -x e^{-\alpha x} - \frac{1}{\alpha} e^{-\alpha x} + C = -e^{-\alpha x}\left(x + \frac{1}{\alpha}\right) + C.$ Evaluating from 0 to $+\infty$ and using the hierarchy of infinities $\lim_{x\to+\infty} x / e^{\alpha x} = 0$ : $E_{\mathbb{P}}(f) = \lim_{K\to+\infty} \left[-e^{-\alpha K}\left(K + \frac{1}{\alpha}\right) + e^0 \cdot \frac{1}{\alpha}\right] = 0 + \frac{1}{\alpha} = \frac{1}{\alpha}. \qquad \blacksquare$

Proof that $E(f) = 0$ for standard normal. The PDF $\varphi(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}$ is symmetric about $0$ , so $E(f) = \int_{-\infty}^{+\infty} x \varphi(x)\, dx = 0$ (integrand is odd and absolutely integrable). Alternatively, a direct computation: $E_{\mathbb{P}}(f) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{+\infty} x e^{-x^2/2} dx = \frac{1}{\sqrt{2\pi}}\left[\lim_{K\to+\infty}\bigl(-e^{-x^2/2}\bigr)\Big|_{-K}^0 + \lim_{K\to+\infty}\bigl(-e^{-x^2/2}\bigr)\Big|_0^K\right] = \frac{1}{\sqrt{2\pi}}[-1 + 1] = 0. \quad \blacksquare$

§4. Worked Examples

Example 4.1 — Poisson $\mathbb{P}(n \ge 3)$ for $\lambda = 2$ ★ May 2024 MCQ4 ★

Problem. Consider $\Omega = \mathbb{N}$ and the Poisson probability $\mathbb{P}$ with parameter $\lambda = 2$ . Compute $\mathbb{P}(E)$ for $E = \{n \in \mathbb{N} : n \ge 3\}$ .

Solution. The Poisson PMF with $\lambda = 2$ is $p_n = e^{-2} \cdot 2^n / n!$ . Rather than summing an infinite tail, use the complement trick: $\mathbb{P}(n \ge 3) = 1 - \mathbb{P}(n \le 2) = 1 - \sum_{n=0}^{2} e^{-2} \frac{2^n}{n!} = 1 - e^{-2}\left(\frac{2^0}{0!} + \frac{2^1}{1!} + \frac{2^2}{2!}\right) = 1 - e^{-2}(1 + 2 + 2) = \boxed{1 - 5 e^{-2}}.$

Common trap. Students sometimes compute $\mathbb{P}(n > 3)$ and use $\mathbb{P}(n \le 3)$ ; since the problem asks for $n \ge 3$ (inclusive), one must use $\mathbb{P}(n \le 2)$ in the complement. A second trap is forgetting $0! = 1$ .

Source: General_24524_ENG_SOL.pdf Mode A MCQ4.

Example 4.2 — Back-solve $\sigma_{\mathbb{P}}(f)$ from covariance and correlation ★ May 2024 MCQ5 ★

Problem. Let $f, g$ be two random variables on $(\Omega, \mathbb{P})$ with $V_{\mathbb{P}}(g) = 100$ , $\rho_{\mathbb{P}}(f, g) = -0.1$ , and $|\operatorname{Cov}_{\mathbb{P}}(2f, 5g)| = 90$ . Compute $\sigma_{\mathbb{P}}(f)$ .

Solution. First, by covariance bilinearity (§3.14.2): $\operatorname{Cov}_{\mathbb{P}}(2f, 5g) = 2 \cdot 5 \cdot \operatorname{Cov}_{\mathbb{P}}(f, g) = 10 \operatorname{Cov}_{\mathbb{P}}(f, g),$ so $|\operatorname{Cov}_{\mathbb{P}}(f, g)| = |10 \operatorname{Cov}|/10 = 90/10 = 9$ .

Second, $\sigma_{\mathbb{P}}(g) = \sqrt{V_{\mathbb{P}}(g)} = \sqrt{100} = 10$ .

Third, recall the definition of the correlation coefficient: $\rho_{\mathbb{P}}(f, g) = \frac{\operatorname{Cov}_{\mathbb{P}}(f, g)}{\sigma_{\mathbb{P}}(f) \sigma_{\mathbb{P}}(g)} = -0.1.$ Since $\rho < 0$ , the sign of $\operatorname{Cov}(f, g)$ must be negative: $\operatorname{Cov}_{\mathbb{P}}(f, g) = -9$ . Substituting: $-0.1 = \frac{-9}{\sigma_{\mathbb{P}}(f) \cdot 10} \qquad \Longrightarrow \qquad \sigma_{\mathbb{P}}(f) = \frac{-9}{-0.1 \cdot 10} = \boxed{9}.$

Common trap. Forgetting to square $\alpha$ inside $V$ (e.g., computing $V(2f) = 2 V(f)$ instead of $4 V(f)$ ) — that is the variance rule, not the covariance rule. Another trap: forgetting to extract the sign from $\rho$ and ending up with $\sigma(f) = \pm 9$ .

Source: General_24524_ENG_SOL.pdf Mode A MCQ5.

Example 4.3 — Bayes' theorem (classic disease test)

Problem. A diagnostic test is 99% sensitive (true-positive rate) and 99% specific (true-negative rate). The disease affects 1 in 1000 people. A randomly chosen person tests positive. What is the probability that they actually have the disease?

Solution. Let $D$ = "has disease", $T^+$ = "tests positive". Given: $\mathbb{P}(D) = 0.001$ , $\mathbb{P}(T^+ \mid D) = 0.99$ , $\mathbb{P}(T^+ \mid D^c) = 1 - 0.99 = 0.01$ . Compute $\mathbb{P}(D \mid T^+)$ using Bayes: $\mathbb{P}(D \mid T^+) = \frac{\mathbb{P}(T^+ \mid D) \mathbb{P}(D)}{\mathbb{P}(T^+)} = \frac{0.99 \cdot 0.001}{0.99 \cdot 0.001 + 0.01 \cdot 0.999} = \frac{0.00099}{0.00099 + 0.00999} = \frac{0.00099}{0.01098} \approx 0.0902.$

Observation. Despite the test being "99% accurate", the posterior probability of having the disease given a positive test is only $\sim 9\%$ — because the disease is rare, false positives dominate true positives. This is the classic base-rate fallacy.

Example 4.4 — $E$ and $V$ of a custom discrete random variable

Problem. Roll a (fair) die. Define $f(\omega) = +10$ if $\omega$ is even, $-10$ if $\omega$ is odd. Compute $E(f)$ , $V(f)$ , $\sigma(f)$ .

Solution. $E_{\mathbb{P}}(f) = 10 \cdot \mathbb{P}(\{2, 4, 6\}) - 10 \cdot \mathbb{P}(\{1, 3, 5\}) = 10 \cdot \frac{1}{2} - 10 \cdot \frac{1}{2} = 0.$

$V_{\mathbb{P}}(f) = E_{\mathbb{P}}(f^2) - [E_{\mathbb{P}}(f)]^2 = (10^2) \cdot \frac{1}{2} + (-10)^2 \cdot \frac{1}{2} - 0^2 = 100.$

$\sigma_{\mathbb{P}}(f) = \sqrt{100} = 10.$

Extension. Compute $V_{\mathbb{P}}(6 - 3f)$ and $\sigma_{\mathbb{P}}(6 - 3f)$ using the affine-function rule: $V(6 - 3f) = (-3)^2 V(f) = 9 \cdot 100 = 900, \qquad \sigma(6 - 3f) = |-3| \cdot \sigma(f) = 3 \cdot 10 = 30.$

Source: lect6 p.8.

Example 4.5 — Dirac additivity proof (the May 2024 Q10 "template")

Problem. Verify directly (no monotone-sequence trick) that $\delta_{\omega_0}(E_1 \cup E_2) = \delta_{\omega_0}(E_1) + \delta_{\omega_0}(E_2)$ for any disjoint events $E_1, E_2$ — i.e., check that $\delta_{\omega_0}$ satisfies Kolmogorov's axiom K3.

Solution — by cases.

Case 1: $\omega_0 \notin E_1 \cup E_2$ . Then $\omega_0 \notin E_1$ and $\omega_0 \notin E_2$ , so $\delta_{\omega_0}(E_1) = \delta_{\omega_0}(E_2) = 0$ and $\delta_{\omega_0}(E_1 \cup E_2) = 0 = 0 + 0$ . $\checkmark$

Case 2: $\omega_0 \in E_1 \cup E_2$ . Since $E_1 \cap E_2 = \emptyset$ , $\omega_0$ belongs to exactly one of $E_1, E_2$ — say $\omega_0 \in E_1$ (the other case is symmetric). Then $\delta_{\omega_0}(E_1) = 1$ , $\delta_{\omega_0}(E_2) = 0$ , $\delta_{\omega_0}(E_1 \cup E_2) = 1 = 1 + 0$ . $\checkmark$

In both cases K3 holds. The K1 and K2 axioms (positivity and normalisation) are immediate from the definition: $\delta_{\omega_0}(E) \in \{0, 1\} \subseteq [0, 1]$ for every $E$ , and $\delta_{\omega_0}(\Omega) = 1$ since $\omega_0 \in \Omega$ .

Extension to countable additivity. This is exactly the two-case structure of Proposition 2006 (§3.8), adapted to a decreasing sequence. Memorise the argument — on the exam, the two-case skeleton "Case I: $\omega_0 \in A$ / Case II: $\omega_0 \notin A$ " is the entire proof.

Example 4.6 — Covariance from a joint PMF table

Problem (lect7 p.4, modified). Roll a fair die. Define $f(\omega) = \begin{cases} -1 & \omega \in \{1, 2, 3\} \\ +1 & \omega \in \{4, 5, 6\} \end{cases}, \qquad g(\omega) = \begin{cases} -2 & \omega \in \{1, 2\} \\ 0 & \omega \in \{3, 4\} \\ +2 & \omega \in \{5, 6\} \end{cases}.$ Compute $\operatorname{Cov}(f, g)$ and $\rho(f, g)$ .

Solution. $E(f) = (-1) \cdot \frac{1}{2} + 1 \cdot \frac{1}{2} = 0.$ $E(g) = (-2) \cdot \frac{1}{3} + 0 \cdot \frac{1}{3} + 2 \cdot \frac{1}{3} = 0.$ $E(fg)$ : evaluate $fg$ at each of the 6 outcomes:

| $\omega$ | $f(\omega)$ | $g(\omega)$ | $f(\omega) g(\omega)$ | $\mathbb{P}(\{\omega\}) = 1/6$ | |---|---|---|---|---| | 1 | $-1$ | $-2$ | $2$ | $1/6$ | | 2 | $-1$ | $-2$ | $2$ | $1/6$ | | 3 | $-1$ | $0$ | $0$ | $1/6$ | | 4 | $+1$ | $0$ | $0$ | $1/6$ | | 5 | $+1$ | $+2$ | $2$ | $1/6$ | | 6 | $+1$ | $+2$ | $2$ | $1/6$ |

$E(fg) = (2 + 2 + 0 + 0 + 2 + 2) \cdot \frac{1}{6} = \frac{8}{6} = \frac{4}{3}$ . $\operatorname{Cov}(f, g) = E(fg) - E(f) E(g) = \frac{4}{3} - 0 = \frac{4}{3}$ .

$V(f) = E(f^2) - 0 = 1, V(g) = (-2)^2/3 + 0 + (+2)^2/3 = 8/3$ . So $\sigma(f) = 1, \sigma(g) = \sqrt{8/3} = 2\sqrt{2/3}$ . $\rho(f, g) = \frac{4/3}{1 \cdot 2\sqrt{2/3}} = \frac{4/3}{2\sqrt{2/3}} = \frac{2}{3\sqrt{2/3}} = \frac{2}{\sqrt{6}} = \frac{\sqrt{6}}{3} \approx 0.816.$ A strong positive linear relationship, as expected (both $f$ and $g$ increase with $\omega$ ).

Example 4.7 — Independence check

Problem. Roll a fair die. Let $A = \{\text{even}\} = \{2, 4, 6\}$ and $B = \{\le 3\} = \{1, 2, 3\}$ . Are $A$ and $B$ independent?

Solution. $\mathbb{P}(A) = 1/2$ , $\mathbb{P}(B) = 1/2$ , $\mathbb{P}(A \cap B) = \mathbb{P}(\{2\}) = 1/6$ . Check: $\mathbb{P}(A)\mathbb{P}(B) = 1/4 \neq 1/6 = \mathbb{P}(A \cap B)$ . NOT independent.

Verify via conditional probability: $\mathbb{P}(A \mid B) = \mathbb{P}(A \cap B)/\mathbb{P}(B) = (1/6)/(1/2) = 1/3 \neq 1/2 = \mathbb{P}(A)$ — knowing the outcome is $\le 3$ decreases the chance of evenness from 50% to 33%.

Contrast. Let $A = \{\text{even}\}$ and $B' = \{1, 2\}$ . $\mathbb{P}(A \cap B') = \mathbb{P}(\{2\}) = 1/6$ ; $\mathbb{P}(A)\mathbb{P}(B') = (1/2)(1/3) = 1/6$ . Independent.

§5. Solution Methods

Each method is a named algorithm (input → steps → output → pitfalls). Cross-references in the right column point to the May 2024 exam problems where the method is applied.

M-P-1 — Compute a probability via Kolmogorov rules

Used on: general MCQs, TA exercises.

Input. A probability $\mathbb{P}$ and an event $A$ (possibly built from unions, intersections, complements, or conditional formulas).

Steps.

Rewrite $A$ in a canonical form using set algebra (complements, finite unions/intersections).
Apply the Kolmogorov tools one at a time:
- Complement: $\mathbb{P}(A^c) = 1 - \mathbb{P}(A)$ .
- Inclusion–exclusion: $\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) - \mathbb{P}(A \cap B)$ .
- Finite additivity (disjoint): $\mathbb{P}(A_1 \cup \cdots \cup A_n) = \sum \mathbb{P}(A_i)$ if pairwise disjoint.
- Monotonicity: if $A \subseteq B$ , $\mathbb{P}(A) \le \mathbb{P}(B)$ .
- Union bound: $\mathbb{P}(\bigcup A_i) \le \sum \mathbb{P}(A_i)$ (always, no disjointness needed).
Simplify.

Output. A numerical value.

Pitfalls.

Forgetting to subtract $\mathbb{P}(A \cap B)$ when applying inclusion–exclusion to non-disjoint events.
Using finite additivity on non-disjoint events (wrong answer).
Neglecting the normalization $\mathbb{P}(\Omega) = 1$ when computing $\mathbb{P}(A^c)$ (forgetting the "1").

M-P-2 — Apply Bayes' theorem

Used on: disease-test, gambler's problems.

Input. Prior probabilities $\mathbb{P}(E_i)$ for a partition $\{E_i\}$ , and likelihoods $\mathbb{P}(A \mid E_i)$ for an observed event $A$ .

Steps.

Identify the partition $\{E_i\}$ and compute $\mathbb{P}(A) = \sum_i \mathbb{P}(A \mid E_i) \mathbb{P}(E_i)$ via the law of total probability.
Apply Bayes: $\mathbb{P}(E_k \mid A) = \mathbb{P}(A \mid E_k) \mathbb{P}(E_k) / \mathbb{P}(A)$ .

Output. The posterior probability $\mathbb{P}(E_k \mid A)$ .

Pitfalls.

Confusing $\mathbb{P}(E \mid A)$ with $\mathbb{P}(A \mid E)$ (the "confusion of the inverse").
Forgetting the prior factor $\mathbb{P}(E)$ — students sometimes conclude $\mathbb{P}(E \mid A) = \mathbb{P}(A \mid E)$ .

M-P-3 — $\mathbb{P}(X \ge k)$ for discrete: use $1 - \mathbb{P}(X \le k-1)$

Used on: May 2024 MCQ4 (Poisson tail).

Input. A discrete random variable $X$ and a threshold $k$ .

Steps.

Rewrite $\{X \ge k\} = \{X \le k-1\}^c$ .
Compute $\mathbb{P}(X \le k-1) = \sum_{i=0}^{k-1} \mathbb{P}(X = i)$ — a finite sum.
Subtract from 1: $\mathbb{P}(X \ge k) = 1 - \mathbb{P}(X \le k-1)$ .

Output. A numerical expression.

Pitfalls.

Using the strict-inequality complement $\{X > k\} = \{X \le k\}^c$ instead of $\{X \ge k\} = \{X \le k-1\}^c$ . For a continuous random variable the distinction doesn't matter (since $\mathbb{P}(X = k) = 0$ ); for a discrete one, it does.
Forgetting $0! = 1$ in Poisson or mis-expanding $\lambda^0 = 1$ .

M-P-4 — Back-solve for $\sigma / V$ using Cov/correlation relationships

Used on: May 2024 MCQ5 (the canonical "given $\rho$ and $|\operatorname{Cov}(\alpha f, \beta g)|$ , find $\sigma(f)$ ").

Input. Given values of $V(g)$ , $\rho(f, g)$ , and $|\operatorname{Cov}(\alpha f, \beta g)|$ (or similar). Find an unknown $\sigma(f)$ or $V(f)$ .

Steps.

Unwrap the covariance using bilinearity: $\operatorname{Cov}(\alpha f, \beta g) = \alpha \beta \operatorname{Cov}(f, g)$ .
Extract $|\operatorname{Cov}(f, g)|$ from the given $|\operatorname{Cov}(\alpha f, \beta g)|$ : divide by $|\alpha \beta|$ .
Use the given $\rho(f, g)$ to recover the sign: $\operatorname{Cov}(f, g)$ has the same sign as $\rho$ .
Plug into $\rho = \operatorname{Cov}/(\sigma(f) \sigma(g))$ and solve for $\sigma(f)$ (using $\sigma(g) = \sqrt{V(g)}$ ).

Output. $\sigma(f)$ .

Pitfalls.

Confusing the variance rule ( $V(\alpha f) = \alpha^2 V(f)$ ) with the covariance rule ( $\operatorname{Cov}(\alpha f, \beta g) = \alpha \beta \operatorname{Cov}(f, g)$ ). Covariance is bilinear, not quadratic.
Forgetting the sign of the correlation when taking $|\operatorname{Cov}|$ ; exam problems often give $|\cdot|$ expressly to test this.
Plugging in $V(g)$ where $\sigma(g)$ is needed (must take square root).

M-P-5 — Compute $E(X)$ , $V(X)$ from a PMF or PDF

Used on: worked examples throughout (§4.4, §4.6, §4.7).

Input. The PMF $\varphi(y_i) = \mathbb{P}(X = y_i)$ or PDF $\varphi(x)$ , and the random variable $f$ (often $f = \text{identity}$ ).

Steps (discrete).

$E(f) = \sum_i y_i \varphi(y_i)$ .
$E(f^2) = \sum_i y_i^2 \varphi(y_i)$ .
$V(f) = E(f^2) - [E(f)]^2$ (computation formula — easier than the definition).
$\sigma(f) = \sqrt{V(f)}$ .

Steps (continuous).

$E(f) = \int_a^b x \varphi(x)\,dx$ (restricted to the carrier $[a,b]$ if one exists).
$E(f^2) = \int_a^b x^2 \varphi(x)\,dx$ .
$V(f) = E(f^2) - [E(f)]^2$ .

Output. $E(f)$ , $V(f)$ , $\sigma(f)$ .

Pitfalls.

For discrete: summing over $y_i$ instead of $\omega_i$ . Recall $\varphi(y_i) = \mathbb{P}(f = y_i) = \sum_{\omega : f(\omega) = y_i} \mathbb{P}(\{\omega\})$ .
For continuous: using the wrong bounds; recall $\varphi(x) = 0$ outside the carrier, so $\int_{-\infty}^{+\infty} = \int_a^b$ whenever a carrier exists.
Using $V(f) = \sum_i y_i^2 \varphi(y_i)$ (forgetting the subtraction of $[E(f)]^2$ ).

M-P-6 — Verify countable additivity for a candidate $\mathbb{P}$

Used on: May 2024 Q10 (verify Dirac is CA).

Input. A candidate probability $\mathbb{P}$ and the hypothesis "is $\mathbb{P}$ countably additive?".

Steps.

Consider an arbitrary decreasing sequence $A_n \downarrow A$ (i.e., $A_1 \supseteq A_2 \supseteq \cdots$ and $A = \bigcap A_n$ ).
Compute $\lim_{n\to\infty} \mathbb{P}(A_n)$ .
Compare to $\mathbb{P}(A)$ .
If they are equal for every such sequence, $\mathbb{P}$ is countably additive. Otherwise, not.

Alternative (for "simple" probabilities). Invoke Theorem 2002: every simple probability is a convex combination of Diracs, each of which is CA (Prop 2006). Hence every simple probability is CA.

Alternative (for "contradiction"). Assume $\mathbb{P}$ is CA and derive a contradiction — e.g., uniform on $\mathbb{N}$ leads to $1 = 0$ or $1 = +\infty$ (Prop 2009).

Output. "Yes, $\mathbb{P}$ is countably additive" (with proof) or "No, $\mathbb{P}$ is not countably additive" (with counterexample).

Pitfalls.

Using an increasing sequence $A_n \uparrow A$ instead of decreasing, or vice versa — the two formulations are equivalent (lect2) but the Dirac proof is cleanest on a decreasing sequence (proposition 2006).
Forgetting that countable additivity is not automatic — it must be verified for each candidate. Poisson, geometric, and all simple probabilities are CA; uniform on $\mathbb{N}$ is not.

M-P-7 — Check independence of events or random variables

Used on: Example 4.7, lect3 exercises.

Input. Two events $A, B$ (or two random variables $f, g$ ) and a probability $\mathbb{P}$ .

Steps (events).

Compute $\mathbb{P}(A)$ , $\mathbb{P}(B)$ , $\mathbb{P}(A \cap B)$ separately.
Check: $\mathbb{P}(A \cap B) \overset{?}{=} \mathbb{P}(A) \mathbb{P}(B)$ .
If equal, $A$ and $B$ are independent; otherwise, not.

Steps (random variables).

Verify $\mathbb{P}(f \le x, g \le y) = \mathbb{P}(f \le x) \mathbb{P}(g \le y)$ for every $x, y$ (usually equivalent to checking the joint PMF factors: $\mathbb{P}(f = y_i, g = z_j) = \mathbb{P}(f = y_i) \mathbb{P}(g = z_j)$ ).
As a necessary condition, check $\operatorname{Cov}(f, g) = 0$ — but this is not sufficient!

Output. Yes/no with justification.

Pitfalls.

Zero covariance does not imply independence. Counterexample: let $f$ be uniform on $\{-1, 0, 1\}$ and $g = f^2$ . Then $\operatorname{Cov}(f, g) = E(f g) - E(f) E(g) = E(f^3) - E(f) E(f^2) = 0 - 0 \cdot (2/3) = 0$ , yet $g$ is completely determined by $f$ — they are obviously not independent. The equivalence $\operatorname{Cov} = 0 \Leftrightarrow$ independence holds only for jointly normal random variables.
Mistaking disjointness for independence. Disjoint events $A, B$ (with $\mathbb{P}(A), \mathbb{P}(B) > 0$ ) are always dependent because $\mathbb{P}(A \cap B) = 0 \ne \mathbb{P}(A) \mathbb{P}(B)$ .

§6. Practice Problems with Solutions

Problem 6.1 — lect1 Q1: set function, measure, or probability?

An equity fund contains 50 stocks: $\Omega = \{T_1, T_2, \ldots, T_{50}\}$ . Let $\phi : \Omega \to \mathbb{R}$ send each stock to the difference between its opening and closing price on a given day. Consider $\mu : 2^\Omega \to \mathbb{R}$ with $\mu(A) = \sum_{T_i \in A} \phi(T_i)$ . Which is correct? (A) $\mu$ is a set function but not a measure; (B) $\mu$ is a measure but not a probability; (C) $\mu$ is a probability; (D) none of the preceding.

Solution. Check the axioms one by one.

Grounded: $\mu(\emptyset) = 0$ . ✓
Positivity: $\mu(A)$ can be negative (some stocks fell, so $\phi(T_i) < 0$ ). ✗
Additivity: for disjoint $A, B$ , $\mu(A \cup B) = \sum_{T_i \in A \cup B} \phi(T_i) = \sum_{T_i \in A} \phi(T_i) + \sum_{T_i \in B} \phi(T_i) = \mu(A) + \mu(B)$ . ✓

Since positivity fails, $\mu$ is not a measure (and hence not a probability). It is a set function. Answer: (A).

Remark. If $\phi$ had been defined as the absolute value of the price difference, then $\mu$ would be a measure (positivity restored). But $\mu(\Omega) = \sum_{i=1}^{50} |\phi(T_i)|$ need not equal 1, so it still wouldn't be a probability unless normalized.

Source: lect1, Q1 (TA handout).

Problem 6.2 — May 2024 MCQ4 (Mode A): Poisson $\mathbb{P}(n \ge 3)$

Consider $\Omega = \mathbb{N}$ and the Poisson probability $\mathbb{P}$ with parameter $\lambda = 2$ . The probability of $E = \{n \in \mathbb{N} : n \ge 3\}$ is: (A) $1 - 5 e^{-2}$ ; (B) $e^{-2}/2$ ; (C) $1 - 3 e^{-2}$ ; (D) none.

Solution. See §4.1. Use the complement: $\mathbb{P}(E) = 1 - \mathbb{P}(n \le 2) = 1 - e^{-2}(1 + 2 + 2) = 1 - 5 e^{-2}. \qquad \textbf{Answer: (A)}.$

Source: General_24524_ENG_SOL.pdf p.2.

Problem 6.3 — May 2024 MCQ5 (Mode A): back-solve $\sigma_{\mathbb{P}}(f)$

Let $f, g$ be random variables on $(\Omega, \mathbb{P})$ with $V_{\mathbb{P}}(g) = 100$ , $\rho_{\mathbb{P}}(f, g) = -0.1$ , and $|\operatorname{Cov}_{\mathbb{P}}(2f, 5g)| = 90$ . Then $\sigma_{\mathbb{P}}(f) = ?$ (A) 9; (B) 3; (C) 10; (D) none.

Solution. See §4.2. Bilinearity: $\operatorname{Cov}(2f, 5g) = 10 \operatorname{Cov}(f, g) \Rightarrow |\operatorname{Cov}(f, g)| = 9$ . Sign from $\rho$ : $\operatorname{Cov}(f, g) = -9$ . $\sigma(g) = 10$ . Solve $-0.1 = -9/(10 \sigma(f)) \Rightarrow \sigma(f) = 9$ . Answer: (A).

Source: General_24524_ENG_SOL.pdf p.2.

Problem 6.4 — May 2024 Q10 (Mode A): Dirac probability and countable additivity ★

(a) Define the Dirac probability over the state space $\Omega$ . (b) Give the definition of countable additivity for a probability $\mathbb{P} : 2^\Omega \to [0, 1]$ . (c) Prove that any Dirac probability is countably additive.

Solution.

(a) (Marinacci example 1993, §2.5.) Fix $\omega_0 \in \Omega$ . The Dirac probability concentrated at $\omega_0$ is $\delta_{\omega_0}(E) = \begin{cases} 1 & \text{if } \omega_0 \in E \\ 0 & \text{if } \omega_0 \notin E \end{cases} \qquad \forall E \subseteq \Omega.$ One checks easily that $\delta_{\omega_0}$ is a probability: non-negative and $\in [0, 1]$ ; $\delta_{\omega_0}(\Omega) = 1$ ; additive because for disjoint $E_1, E_2$ , $\omega_0$ belongs to at most one of them.

(b) (Marinacci definition 2003, §2.7.) $\mathbb{P} : 2^\Omega \to [0, 1]$ is countably additive (or $\sigma$ -additive) if for every countable collection $\{E_n\}_{n=1}^{+\infty}$ of pairwise disjoint events ( $E_i \cap E_j = \emptyset$ for $i \neq j$ ), $\mathbb{P}\!\left(\bigcup_{n=1}^{+\infty} E_n\right) = \sum_{n=1}^{+\infty} \mathbb{P}(E_n).$

(c) (Proposition 2006, §3.8 — full proof.) We use the monotone-sequence characterisation: $\mathbb{P}$ is CA iff for every $A_n \downarrow A$ , $\lim_{n\to\infty} \mathbb{P}(A_n) = \mathbb{P}(A)$ . Take any such decreasing sequence.

Case (I): $\omega_0 \in A = \bigcap_{n=1}^{+\infty} A_n$ . Then $\omega_0 \in A_n$ for every $n$ , so $\delta_{\omega_0}(A_n) = 1$ for every $n$ , and $\delta_{\omega_0}(A) = 1$ . Hence $\lim_{n\to\infty} \delta_{\omega_0}(A_n) = 1 = \delta_{\omega_0}(A)$ .

Case (II): $\omega_0 \notin A$ . Then there exists $\bar n \in \mathbb{N}$ with $\omega_0 \notin A_{\bar n}$ . Since $A_n \subseteq A_{\bar n}$ for every $n \ge \bar n$ , we have $\omega_0 \notin A_n$ for every $n \ge \bar n$ , i.e., $\delta_{\omega_0}(A_n) = 0$ eventually. Also $\delta_{\omega_0}(A) = 0$ . Hence $\lim_{n\to\infty} \delta_{\omega_0}(A_n) = 0 = \delta_{\omega_0}(A)$ .

Since $\lim_n \delta_{\omega_0}(A_n) = \delta_{\omega_0}(A)$ in both cases, $\delta_{\omega_0}$ is countably additive. $\blacksquare$

Source: General_24524_ENG_SOL.pdf p.6 (solution: "See example 1993, definition 2003, proposition 2006 (first part of the proof)").

Problem 6.5 — Probability HW: compute $\lambda$ from a Poisson condition

$X \sim \operatorname{Poisson}(\lambda)$ with $\mathbb{P}(X < 1) = e^{-5}$ . Compute (i) $\lambda$ , (ii) $\mathbb{P}(X > 2)$ , (iii) $V(X + 5)$ , (iv) $E(X^2)$ .

Solution.

(i) $\mathbb{P}(X < 1) = \mathbb{P}(X = 0) = e^{-\lambda} \cdot \lambda^0/0! = e^{-\lambda}$ . Setting $e^{-\lambda} = e^{-5}$ gives $\lambda = 5$ .

(ii) $\mathbb{P}(X > 2) = 1 - \mathbb{P}(X \le 2) = 1 - e^{-5}(1 + 5 + 25/2) = 1 - 18.5 e^{-5}$ .

(iii) $V(X + 5) = V(X) = \lambda = 5$ (constant shift doesn't affect variance; §3.13.2 with $\alpha = 1$ ).

(iv) $V(X) = E(X^2) - [E(X)]^2$ so $E(X^2) = V(X) + [E(X)]^2 = 5 + 25 = 30$ .

Source: lect2 p.5 (Probability HW).

Problem 6.6 — TA1_prob_41: simple vs. countable vs. neither

Classify each of the following probabilities on $\Omega$ : (a) Roll a fair die ( $\Omega = \{1, \ldots, 6\}$ , $\mathbb{P}(\{i\}) = 1/6$ ). (b) Poisson( $\lambda$ ) on $\Omega = \mathbb{N}$ . (c) Uniform on $\Omega = \mathbb{N}$ with $\mathbb{P}(\{n\}) = k$ for all $n$ . (d) $\Omega = \mathbb{N}$ , $\mathbb{P}(\{0\}) = \mathbb{P}(\{1\}) = \mathbb{P}(\{2\}) = 1/3$ , $\mathbb{P}(\{n\}) = 0$ for $n \ge 3$ .

Solution.

(a) Simple (finite $\Omega$ , so finite support); countably additive (finite sum always equals countable sum).

(b) Not simple (every $\mathbb{P}(\{n\}) = e^{-\lambda}\lambda^n/n! > 0$ , so the support is all of $\mathbb{N}$ , which is infinite). Countably additive — in fact this is the key example of a CA but non-simple probability (lect2 p.3).

(c) Not countably additive (Proposition 2009, §3.9). Also not simple (support is $\mathbb{N}$ ).

(d) Simple: the finite event $E = \{0, 1, 2\}$ has $\mathbb{P}(E) = 1$ . Countably additive (all simple probabilities are; see remark after §3.8).

Source: lect2 p.1, TA1_prob_41 p.2.

Problem 6.7 — lect7: covariance from a joint experiment

Two dice are rolled independently. Let $f$ = result of first die, $g$ = result of second. Compute $E(f + g)$ , $V(f + g)$ , $\operatorname{Cov}(f, g)$ , $\rho(f, g)$ .

Solution. By independence, $\operatorname{Cov}(f, g) = 0$ and $\rho(f, g) = 0$ . $E(f) = E(g) = 7/2$ ; $E(f + g) = E(f) + E(g) = 7$ . $V(f) = V(g) = 35/12$ ; $V(f + g) = V(f) + V(g) + 2\operatorname{Cov}(f, g) = 35/6 + 0 = 35/6$ .

Problem 6.8 — lect6: expected value of affine function

Given $E(f) = 5$ and $E(g) = 6$ , compute (i) $E(2f + 3g)$ ; (ii) $E(2f + 7)$ ; (iii) $E(-3f + g - 4)$ .

Solution. (i) $E(2f + 3g) = 2 E(f) + 3 E(g) = 10 + 18 = 28$ . (ii) $E(2f + 7) = 2 E(f) + 7 = 17$ . (iii) $E(-3f + g - 4) = -3 E(f) + E(g) - 4 = -15 + 6 - 4 = -13$ .

Source: lect6 p.9.

§7. Common Pitfalls

Dirac verification — must check both cases. When proving $\delta_{\omega_0}$ satisfies a property (additivity, CA, …), always split on " $\omega_0 \in \cdot$ / $\omega_0 \notin \cdot$ ". Forgetting case (II) — i.e., assuming $\omega_0$ is always in the union — leads to an incomplete proof (and loses points on May 2024 Q10c).
Poisson arithmetic — $\lambda^n / n!$ , not $n\lambda$ or $\lambda^n / n$ . $\mathbb{P}(X = n) = e^{-\lambda} \cdot \lambda^n / n!$ . Many students confuse $\lambda^n$ and $n^\lambda$ , or drop the $1/n!$ factor, or replace $n!$ with $n$ . Double-check: $\mathbb{P}(X = 0) = e^{-\lambda}$ (not 0), $\mathbb{P}(X = 1) = \lambda e^{-\lambda}$ , $\mathbb{P}(X = 2) = \lambda^2 e^{-\lambda}/2$ .
Sign errors in correlation. When given $|\operatorname{Cov}|$ , use $\rho$ to recover the sign. A negative $\rho$ forces a negative $\operatorname{Cov}$ . Never write $\sigma(f) = \pm 9$ — standard deviation is always non-negative; the ± belongs to the covariance.
$\operatorname{Cov} = 0$ does not imply independence. Independent $\Rightarrow \operatorname{Cov} = 0$ , but the converse holds only for jointly normal random variables. Counterexample: $f$ uniform on $\{-1, 0, 1\}$ , $g = f^2$ . $\operatorname{Cov}(f, g) = 0$ but $g$ is determined by $f$ .
$V(\alpha f) = \alpha^2 V(f)$ , not $\alpha V(f)$ . Square the coefficient. Contrast with $\operatorname{Cov}(\alpha f, \beta g) = \alpha \beta \operatorname{Cov}(f, g)$ (bilinearity). Variance scales quadratically; covariance scales bilinearly.
Additive constants don't affect variance or covariance. $V(f + \beta) = V(f)$ ; $\operatorname{Cov}(f + \beta, g + \delta) = \operatorname{Cov}(f, g)$ . Only the coefficient matters. (They do affect the expected value.)
Confusing PMF $\varphi(y)$ and CDF $\Phi(x)$ . PMF = "probability at a single value" ( $\varphi(y) = \mathbb{P}(f = y)$ ); CDF = "probability up to and including $x$ " ( $\Phi(x) = \mathbb{P}(f \le x)$ ). They satisfy $\Phi(x) = \sum_{y \le x} \varphi(y)$ (discrete) or $\Phi(x) = \int_{-\infty}^x \varphi(t)\,dt$ (continuous).
PDF $\ne$ $\mathbb{P}(f = x)$ in the continuous case. For a continuous random variable, $\mathbb{P}(f = x) = 0$ for every individual $x$ ; the density $\varphi(x)$ measures probability per unit length, not probability itself. Only the integral over an interval has probabilistic meaning.
Simple probability $\not\Leftrightarrow$ countably additive. Uniform on $\mathbb{N}$ is neither simple nor CA; Poisson is CA but not simple; rolling a fair die is both simple and CA. Memorize the 2×2 grid.
Forgetting the complement trick for tails. $\mathbb{P}(X \ge k)$ has an infinite tail (for Poisson, etc.); use $1 - \mathbb{P}(X \le k - 1)$ to get a finite sum. For a continuous random variable, $\ge$ and $>$ give the same probability.
Monotonicity $\le$ not $<$ . $A \subseteq B$ gives $\mathbb{P}(A) \le \mathbb{P}(B)$ — the inequality is weak. Example (lect1): $M(\{1\}) = 0.5 = M(\{1, 2, 3\})$ if $M(3) = 0$ on a fair die with support $\{1, 2\}$ . Do not assume strict inequality.
Bayes' theorem: numerator must use the likelihood, not the posterior. $\mathbb{P}(E \mid A) = \mathbb{P}(A \mid E) \mathbb{P}(E) / \mathbb{P}(A)$ . Students sometimes write $\mathbb{P}(E \mid A) = \mathbb{P}(E \mid A) \mathbb{P}(E) / \mathbb{P}(A)$ , a circular error.
Distinguishing random variables from events. Events are subsets of $\Omega$ ( $A \subseteq \Omega$ ); random variables are functions $f : \Omega \to \mathbb{R}$ . $\mathbb{P}(A)$ is a number; $E(f)$ is a number; $f$ itself is a function.
Carrier $\ne$ support. "Carrier" is the interval $[a, b]$ outside which $\Phi(x) \in \{0, 1\}$ (used for essentially bounded random variables); "support" is the set of outcomes $\omega \in \Omega$ with $\mathbb{P}(\{\omega\}) > 0$ (used for simple probabilities). Don't confuse them.
Chebyshev gives an upper bound, not an exact probability. $\mathbb{P}(|X - \mu| \ge k\sigma) \le 1/k^2$ is a bound — the actual probability may be much smaller. Do not use Chebyshev as an equality.

Cross-references to the rest of the study guide

Linear algebra (§01): the covariance matrix $\Sigma_{f,g} = \begin{pmatrix} V(f) & \operatorname{Cov}(f,g) \\ \operatorname{Cov}(f,g) & V(g) \end{pmatrix}$ is a symmetric $2 \times 2$ matrix; the eigenvalues determine the sign of $\rho$ (Cauchy–Schwarz $\Leftrightarrow$ positive semi-definite).
Integral calculus (§03): the expected value of a continuous random variable is a Stieltjes integral (§3.20); expected values of exponential and normal distributions use integration by parts (§3.24).
Differential calculus (§02): Markov/Chebyshev are non-differentiable bounds; the Gaussian PDF uses the exponential function studied in §02.
Mathematical finance (§05): the portfolio variance $V(\mathbf{w}^T \mathbf{r}) = \mathbf{w}^T \Sigma \mathbf{w}$ is a direct application of §3.15 (variance of a sum) with weights.

End of §04 — Probability.

Probability

04 — Probability

§1. Overview & Exam Relevance

§2. Definitions

2.1 Sample space, state, event

2.2 Power set 2Ω2^\Omega2Ω

2.3 Set function, measure, probability (the hierarchy)

2.4 Probability — Kolmogorov axioms

2.5 Dirac probability δω0\delta_{\omega_0}δω0​​

2.6 Simple probability

2.7 Countable additivity (σ-additivity)

2.8 Conditional probability

2.9 Independence

2.10 Random variable

2.11 Probability mass function (PMF), probability density function (PDF), cumulative distribution function (CDF)

2.12 Expected value EP(f)E_{\mathbb{P}}(f)EP​(f)

2.13 Variance, standard deviation, covariance, correlation

2.14 A set function that is not a measure (lect1 Q1)

§3. Theorems, Propositions & Proofs

3.1 Monotonicity property of a measure (page 1987)

3.2 Relation between M(A∪B)M(A \cup B)M(A∪B) and M(A∩B)M(A \cap B)M(A∩B) (page 1989)

3.3 Probability of the complement (page 1883)

3.4 Property of a simple probability #1 (page 1999)

3.5 Property of a simple probability #2 (page 2000)

3.6 Property of a simple probability #3 (page 2001)

3.7 Theorem 2002 — Simple probabilities are convex linear combinations of Dirac probabilities

3.8 Proposition 2006 — Every Dirac probability is countably additive ★ May 2024 Q10c ★

3.9 Proposition 2009 — The uniform probability on N\mathbb{N}N is not countably additive

3.10 Page 2014 — Random variables equal P\mathbb{P}P-a.e.: characterization

3.11 Page 2017 — Random variables equal P\mathbb{P}P-a.e. have the same expected value

3.12 Page 2018 — Expected value properties (linearity, monotonicity, extension)

3.13 Page 2024 — Variance properties (computation formula, affine functions)

3.14 Page 2026 — Covariance properties (computation formula, bilinearity) ★ May 2024 MCQ5 ★

3.15 Page 2027 — Variance of a sum of random variables

3.16 Page 2028 — Covariance: boundedness property (Cauchy–Schwarz)

3.17 Distribution function is increasing (page 2032)

3.18 Distribution function for an essentially bounded random variable is eventually constant (page 2037)

3.19 Properties of the distribution function for a continuous density (pages 2041, 2042)

3.20 Expected value of a random variable w.r.t. a simple probability as a Stieltjes integral (page 2043)

3.21 Bayes' theorem and the law of total probability

3.22 Markov's and Chebyshev's inequalities

3.23 Named discrete distributions (lect2, lect6)

3.24 Named continuous distributions (lect7)

§4. Worked Examples

Example 4.1 — Poisson P(n≥3)\mathbb{P}(n \ge 3)P(n≥3) for λ=2\lambda = 2λ=2 ★ May 2024 MCQ4 ★

Example 4.2 — Back-solve σP(f)\sigma_{\mathbb{P}}(f)σP​(f) from covariance and correlation ★ May 2024 MCQ5 ★

Example 4.3 — Bayes' theorem (classic disease test)

Example 4.4 — EEE and VVV of a custom discrete random variable

Example 4.5 — Dirac additivity proof (the May 2024 Q10 "template")

Example 4.6 — Covariance from a joint PMF table

Example 4.7 — Independence check

§5. Solution Methods

M-P-1 — Compute a probability via Kolmogorov rules

M-P-2 — Apply Bayes' theorem

M-P-3 — P(X≥k)\mathbb{P}(X \ge k)P(X≥k) for discrete: use 1−P(X≤k−1)1 - \mathbb{P}(X \le k-1)1−P(X≤k−1)

M-P-4 — Back-solve for σ/V\sigma / Vσ/V using Cov/correlation relationships

M-P-5 — Compute E(X)E(X)E(X), V(X)V(X)V(X) from a PMF or PDF

M-P-6 — Verify countable additivity for a candidate P\mathbb{P}P

M-P-7 — Check independence of events or random variables

§6. Practice Problems with Solutions

Problem 6.1 — lect1 Q1: set function, measure, or probability?

Problem 6.2 — May 2024 MCQ4 (Mode A): Poisson P(n≥3)\mathbb{P}(n \ge 3)P(n≥3)

Problem 6.3 — May 2024 MCQ5 (Mode A): back-solve σP(f)\sigma_{\mathbb{P}}(f)σP​(f)

Problem 6.4 — May 2024 Q10 (Mode A): Dirac probability and countable additivity ★

Problem 6.5 — Probability HW: compute λ\lambdaλ from a Poisson condition

Problem 6.6 — TA1_prob_41: simple vs. countable vs. neither

Problem 6.7 — lect7: covariance from a joint experiment

Problem 6.8 — lect6: expected value of affine function

§7. Common Pitfalls

Cross-references to the rest of the study guide

2.2 Power set $2^\Omega$

2.5 Dirac probability $\delta_{\omega_0}$

2.12 Expected value $E_{\mathbb{P}}(f)$

3.2 Relation between $M(A \cup B)$ and $M(A \cap B)$ (page 1989)

3.9 Proposition 2009 — The uniform probability on $\mathbb{N}$ is not countably additive

3.10 Page 2014 — Random variables equal $\mathbb{P}$ -a.e.: characterization

3.11 Page 2017 — Random variables equal $\mathbb{P}$ -a.e. have the same expected value

Example 4.1 — Poisson $\mathbb{P}(n \ge 3)$ for $\lambda = 2$ ★ May 2024 MCQ4 ★

Example 4.2 — Back-solve $\sigma_{\mathbb{P}}(f)$ from covariance and correlation ★ May 2024 MCQ5 ★

Example 4.4 — $E$ and $V$ of a custom discrete random variable

M-P-3 — $\mathbb{P}(X \ge k)$ for discrete: use $1 - \mathbb{P}(X \le k-1)$

M-P-4 — Back-solve for $\sigma / V$ using Cov/correlation relationships

M-P-5 — Compute $E(X)$ , $V(X)$ from a PMF or PDF

M-P-6 — Verify countable additivity for a candidate $\mathbb{P}$

Problem 6.2 — May 2024 MCQ4 (Mode A): Poisson $\mathbb{P}(n \ge 3)$

Problem 6.3 — May 2024 MCQ5 (Mode A): back-solve $\sigma_{\mathbb{P}}(f)$

Problem 6.5 — Probability HW: compute $\lambda$ from a Poisson condition