P

Probability

3 theorems

04 — Probability

Comprehensive study guide for Bocconi Math Module 2 (cod. 30063), General Exam. Source materials: Alice Sicconi's Probability Proofs.pdf, Probability.pdf, lect{1..8}_prob (2).pdf, TA1_prob_41.pdf, Probability HW.pdf, General_24524_ENG_SOL.pdf.


§1. Overview & Exam Relevance

Probability is the first block of the second partial and accounts for approximately 14% of the general exam (typically 1–2 MCQs worth 5 pts each plus one open-ended question worth up to 20 pts — about 25–30 pts of the 150-pt exam). It is the topic where two May 2024 MCQs (MCQ4, MCQ5) and one flagship 20-pt theorem-statement-plus-proof question (Q10) all live simultaneously, so the marginal value of mastering the definitions cleanly is very high.

Topic scope. The exam tests:

  • the measure-theoretic hierarchy set function → measure → probability: grounded, positive, additive, normalized;
  • Kolmogorov axioms for a probability P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0,1] and their consequences (complement, monotonicity, inclusion–exclusion, union bound);
  • Dirac probability δω0\delta_{\omega_0} — the canonical "point mass" example (May 2024 Q10a);
  • Countable additivity (σ-additivity) as a property that may or may not hold, and the proof that every Dirac probability is countably additive (May 2024 Q10c);
  • Simple probabilities, support, convex-combination representation, and the equivalence between simple probabilities and convex linear combinations of Dirac probabilities;
  • Conditional probability P(AB)\mathbb{P}(A\mid B), Bayes' theorem, law of total probability, independence of events and random variables;
  • Random variables f:ΩRf : \Omega \to \mathbb{R} (Sicconi uses the letter ff for a random variable; Marinacci uses XX), expected value EP(f)E_{\mathbb{P}}(f), variance VP(f)V_{\mathbb{P}}(f), standard deviation σP(f)\sigma_{\mathbb{P}}(f), covariance CovP(f,g)\operatorname{Cov}_{\mathbb{P}}(f,g), linear correlation coefficient ρP(f,g)\rho_{\mathbb{P}}(f,g);
  • Linearity of expectation, variance and covariance properties (computation formula, affine-function rules, bilinearity), Cauchy–Schwarz bound Cov(f,g)σ(f)σ(g)|\operatorname{Cov}(f,g)| \le \sigma(f)\sigma(g);
  • Named discrete distributions: Bernoulli, Binomial, Geometric, Poisson, uniform discrete — PMFs with mean and variance;
  • Named continuous distributions: uniform on [a,b][a,b], (negative) exponential with rate α\alpha, Gaussian/standard normal — density functions, distribution functions, expected values;
  • Distribution function Φ(x)=P(fx)\Phi(x) = \mathbb{P}(f \le x), density φ(x)\varphi(x), carrier [a,b][a,b], essentially bounded random variables;
  • Markov's and Chebyshev's inequalities (absolute-value tail bounds).

Typical MCQ patterns (May 2024 General Exam).

  • MCQ4 (Mode A) / MCQ3 (Mode B): "Consider Ω=N\Omega = \mathbb{N} and the Poisson probability P\mathbb{P} with parameter λ=2\lambda = 2. The probability of the event E={nN:n3}E = \{n \in \mathbb{N} : n \ge 3\} is …". Technique: use the complement, P(n3)=1P(n2)=1e2(20/0!+21/1!+22/2!)=1e2(1+2+2)=15e2\mathbb{P}(n \ge 3) = 1 - \mathbb{P}(n \le 2) = 1 - e^{-2}(2^0/0! + 2^1/1! + 2^2/2!) = 1 - e^{-2}(1 + 2 + 2) = 1 - 5 e^{-2}.
  • MCQ5 (Mode A) / MCQ4 (Mode B): "Let f,gf, g be two random variables on (Ω,P)(\Omega, \mathbb{P}) with VP(g)=100V_{\mathbb{P}}(g) = 100, ρP(f,g)=0.1\rho_{\mathbb{P}}(f,g) = -0.1 and CovP(2f,5g)=90|\operatorname{Cov}_{\mathbb{P}}(2f, 5g)| = 90. Then σP(f)=?\sigma_{\mathbb{P}}(f) = ?". Technique: bilinearity gives Cov(2f,5g)=10Cov(f,g)\operatorname{Cov}(2f, 5g) = 10\operatorname{Cov}(f,g), so Cov(f,g)=9|\operatorname{Cov}(f,g)| = 9, and the sign is read off from ρ=0.1<0\rho = -0.1 < 0, i.e., Cov(f,g)=9\operatorname{Cov}(f,g) = -9. From ρ=Cov/(σ(f)σ(g))\rho = \operatorname{Cov}/(\sigma(f)\sigma(g)) with σ(g)=10\sigma(g) = 10: 0.1=9/(10σ(f))-0.1 = -9/(10 \sigma(f)) so σ(f)=9\sigma(f) = 9.

Typical open-ended pattern (May 2024 Q10 Mode A / Q9 Mode B).

  • Part (a) — Define the Dirac probability δω0\delta_{\omega_0} over a state space Ω\Omega.
  • Part (b) — Give the definition of countable additivity for a probability P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0,1].
  • Part (c) — Prove that any Dirac probability is countably additive. The official solution points to "example 1993, definition 2003, proposition 2006 (first part of the proof)" — i.e., this is a pure memorization-plus-execution question, and the candidate must reproduce the two-case proof exactly.

Why this topic is high-leverage.

  • Every calculation in finance involving expected return or variance is an application of the EE/VV/Cov\operatorname{Cov} machinery proved here.
  • The measure-theoretic framework (grounded/positive/additive) mirrors the Riemann-integral framework of §03: what "mass" is to probability, "area" is to the integral.
  • Q10 is one of the highest-yield 20-point questions: if you memorize proposition 2006 and know how to write definition 2003 cleanly, you can bank ≈20 points in under 12 minutes.

§2. Definitions

2.1 Sample space, state, event

The sample space (or state space) Ω\Omega is the set of all possible outcomes of an experiment.

Elements ωΩ\omega \in \Omega are states (of the world) — individual outcomes. Subsets AΩA \subseteq \Omega are called events. The pair (Ω,P)(\Omega, \mathbb{P}) is called a probability space.

Three types of sample space (lect1):

  • Discrete finite: Ω=n|\Omega| = n. Example: rolling a die, Ω={1,2,3,4,5,6}\Omega = \{1, 2, 3, 4, 5, 6\}.
  • Discrete countable: Ω\Omega is infinite but countable. Example: Ω=N={0,1,2,}\Omega = \mathbb{N} = \{0, 1, 2, \ldots\}.
  • Continuous: Ω\Omega is uncountable. Example: Ω=[0,2]R\Omega = [0, 2] \subseteq \mathbb{R}.

2.2 Power set 2Ω2^\Omega

The power set P(Ω)=2Ω\mathcal{P}(\Omega) = 2^\Omega is the collection of all subsets of Ω\Omega. For a finite sample space with Ω=n|\Omega| = n, we have 2Ω=2n|2^\Omega| = 2^n.

Every probability measure is defined on 2Ω2^\Omega: it assigns a real number to each event. The notation 2Ω2^\Omega (rather than Ω\Omega) emphasises that the domain of P\mathbb{P} is the set of events, not the set of outcomes.

Example (lect1 p.1). Ω={1,2,3}\Omega = \{1, 2, 3\}, 2Ω={,{1},{2},{3},{1,2},{2,3},{1,3},Ω}2^\Omega = \{\emptyset, \{1\}, \{2\}, \{3\}, \{1,2\}, \{2,3\}, \{1,3\}, \Omega\}. There are 23=82^3 = 8 events; P()=0\mathbb{P}(\emptyset) = 0 and P(Ω)=1\mathbb{P}(\Omega) = 1 are hard-coded by the Kolmogorov axioms.

2.3 Set function, measure, probability (the hierarchy)

(Source: lect1_prob (2).pdf pp.1–3.)

Set function M:2ΩRM : 2^\Omega \to \mathbb{R} — the least restrictive object: it simply assigns a real number to every event.

Measure M:2Ω[0,+)M : 2^\Omega \to [0, +\infty) — a set function satisfying the three measure axioms:

  1. Grounded: M()=0M(\emptyset) = 0.
  2. Positive (non-negative): M(A)0M(A) \ge 0 for every AΩA \subseteq \Omega.
  3. (Finitely) additive: for every A,BΩA, B \subseteq \Omega with AB=A \cap B = \emptyset, M(AB)=M(A)+M(B).M(A \cup B) = M(A) + M(B).

Probability P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1] — a measure that additionally satisfies the normalisation axiom:

  1. Normalized: P(Ω)=1\mathbb{P}(\Omega) = 1.

The four properties 1–4 are the Kolmogorov axioms. Every probability is a measure; every measure is a set function. The reverse inclusions fail in general (cf. Example 2.14 below).

2.4 Probability — Kolmogorov axioms

A function P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1] is a probability (or probability measure) on (Ω,2Ω)(\Omega, 2^\Omega) if:

(K1) Non-negativityP(A)0\mathbb{P}(A) \ge 0 for every event AΩA \subseteq \Omega. (K2) NormalizationP(Ω)=1\mathbb{P}(\Omega) = 1. (K3) (Finite) additivity — if A,BΩA, B \subseteq \Omega and AB=A \cap B = \emptyset, then P(AB)=P(A)+P(B)\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B).

The axiom P()=0\mathbb{P}(\emptyset) = 0 is a consequence of the other three (take A=B=A = B = \emptyset in K3; then P()=2P()\mathbb{P}(\emptyset) = 2\mathbb{P}(\emptyset), so P()=0\mathbb{P}(\emptyset) = 0).

2.5 Dirac probability δω0\delta_{\omega_0}

Fix ω0Ω\omega_0 \in \Omega. The Dirac probability concentrated at ω0\omega_0 is the function δω0:2Ω[0,1]\delta_{\omega_0} : 2^\Omega \to [0, 1] defined by   δω0(E)  =  {1if ω0E0if ω0EEΩ.  \boxed{\;\delta_{\omega_0}(E) \;=\; \begin{cases} 1 & \text{if } \omega_0 \in E \\ 0 & \text{if } \omega_0 \notin E \end{cases} \qquad \forall\, E \subseteq \Omega.\;}

This is the Marinacci example 1993. Intuition: δω0\delta_{\omega_0} describes a "sure outcome" — it is as if we already know the result of the experiment is ω0\omega_0, so events that contain ω0\omega_0 are certain and all others are impossible.

Example (lect1 p.3). Roll a die (Ω={1,2,3,4,5,6}\Omega = \{1, 2, 3, 4, 5, 6\}) but you are somehow told that the outcome is 44. Consider A={2,4,6}A = \{2, 4, 6\} (even outcomes). Then δ4(A)=1\delta_4(A) = 1 because 4A4 \in A. δ4({1})=δ4({3})=δ4({5})=0\delta_4(\{1\}) = \delta_4(\{3\}) = \delta_4(\{5\}) = 0 and δ4({2})=δ4({4})=δ4({6})=1\delta_4(\{2\}) = \delta_4(\{4\}) = \delta_4(\{6\}) = 1 (each singleton outcome has probability 0 or 1).

2.6 Simple probability

A probability P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1] is called simple if there exists a finite event EΩE \subseteq \Omega with P(E)=1\mathbb{P}(E) = 1. Intuitively: a simple probability has only finitely many "realistically possible" outcomes, even if Ω\Omega itself is infinite.

The support of a simple probability is the set of outcomes with strictly positive probability: suppP  =  {ωΩ:P({ω})>0}.\operatorname{supp} \mathbb{P} \;=\; \{\omega \in \Omega : \mathbb{P}(\{\omega\}) > 0\}.

Every simple probability can be written (Theorem 2002 — stated below) as a convex linear combination of Dirac probabilities: P(A)=ωsuppPP({ω})δω(A)AΩ.\mathbb{P}(A) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_{\omega}(A) \qquad \forall\, A \subseteq \Omega.

2.7 Countable additivity (σ-additivity)

(Marinacci definition 2003.) A probability P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1] is countably additive (also called σ-additive) if for every countable collection {En}n=1+\{E_n\}_{n=1}^{+\infty} of pairwise disjoint events (i.e., EiEj=E_i \cap E_j = \emptyset for all iji \neq j),   P ⁣(n=1+En)  =  n=1+P(En).  \boxed{\;\mathbb{P}\!\left(\bigcup_{n=1}^{+\infty} E_n\right) \;=\; \sum_{n=1}^{+\infty} \mathbb{P}(E_n).\;}

This is the countable version of the finite-additivity axiom (K3). It is not automatically implied by K1–K3 and must be checked case-by-case. (Counterexample: the uniform probability on Ω=N\Omega = \mathbb{N} is a set function that satisfies the first three Kolmogorov axioms in spirit, but it is not countably additive — see Proposition 2009 below.)

Equivalent monotone-sequence formulation (lect2): P\mathbb{P} is countably additive     \iff for every increasing collection AnAA_n \uparrow A (i.e., A1A2A_1 \subseteq A_2 \subseteq \cdots and A=AnA = \bigcup A_n), P(A)=limnP(An)\mathbb{P}(A) = \lim_{n\to\infty} \mathbb{P}(A_n); equivalently, for every decreasing collection AnAA_n \downarrow A, P(A)=limnP(An)\mathbb{P}(A) = \lim_{n\to\infty} \mathbb{P}(A_n).

2.8 Conditional probability

Let P\mathbb{P} be a probability on Ω\Omega and let BΩB \subseteq \Omega be an event with P(B)>0\mathbb{P}(B) > 0. The conditional probability of AA given BB is P(AB)  =  P(AB)P(B).\mathbb{P}(A \mid B) \;=\; \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}. For fixed BB, the function AP(AB)A \mapsto \mathbb{P}(A \mid B) is itself a probability on Ω\Omega (with support concentrated on BB). Intuition: once you know BB has happened, rescale probabilities to make BB the new "certain" event.

2.9 Independence

Independent events. Two events A,BΩA, B \subseteq \Omega are independent (under P\mathbb{P}) if P(AB)  =  P(A)P(B).\mathbb{P}(A \cap B) \;=\; \mathbb{P}(A) \cdot \mathbb{P}(B). Equivalently (when P(B)>0\mathbb{P}(B) > 0), P(AB)=P(A)\mathbb{P}(A \mid B) = \mathbb{P}(A) — knowing BB occurred does not change the probability of AA.

Independent random variables. Random variables f,g:ΩRf, g : \Omega \to \mathbb{R} are independent if the events {fx}\{f \le x\} and {gy}\{g \le y\} are independent for every x,yRx, y \in \mathbb{R}: P(fx,gy)  =  P(fx)P(gy).\mathbb{P}(f \le x, g \le y) \;=\; \mathbb{P}(f \le x) \cdot \mathbb{P}(g \le y). Independence of ff and gg implies E[fg]=E[f]E[g]E[f g] = E[f] \cdot E[g], hence Cov(f,g)=0\operatorname{Cov}(f, g) = 0. Caveat: the converse is false in general (see §7 — "zero correlation does not imply independence").

2.10 Random variable

A random variable is any function f:ΩRf : \Omega \to \mathbb{R}. (Sicconi's convention uses ff; Marinacci sometimes uses XX. In this guide we mix the two: lowercase f,g,hf, g, h for random variables as in the lectures, and uppercase X,Y,ZX, Y, Z when following the textbook verbatim — they are interchangeable.)

Intuition (lect6 p.1). A random variable is a bet that assigns a real-valued payoff to each outcome of the experiment. Example: roll a die and bet 10 euros on an even outcome, lose 10 euros on an odd outcome: f(ω)={10ω{2,4,6}10ω{1,3,5}.f(\omega) = \begin{cases} 10 & \omega \in \{2, 4, 6\} \\ -10 & \omega \in \{1, 3, 5\} \end{cases}.

2.11 Probability mass function (PMF), probability density function (PDF), cumulative distribution function (CDF)

Let ff be a random variable on (Ω,P)(\Omega, \mathbb{P}).

CDF (distribution function) Φ:R[0,1]\Phi : \mathbb{R} \to [0, 1]: Φ(x)  =  P({ωΩ:f(ω)x})  =  P(fx)xR.\Phi(x) \;=\; \mathbb{P}(\{\omega \in \Omega : f(\omega) \le x\}) \;=\; \mathbb{P}(f \le x) \qquad \forall\, x \in \mathbb{R}.

Simple (discrete) density function φ:R[0,1]\varphi : \mathbb{R} \to [0, 1], when ff takes finitely many distinct values y1,,yny_1, \ldots, y_n: φ(yi)  =  P(f=yi),φ(x)=0 otherwise.\varphi(y_i) \;=\; \mathbb{P}(f = y_i), \qquad \varphi(x) = 0 \text{ otherwise.} This is Marinacci's simple/finite density function (lect6 p.7); in standard terminology it is the probability mass function (PMF). One always has i=1nφ(yi)=1\sum_{i=1}^n \varphi(y_i) = 1 and Φ(x)=yixφ(yi)\Phi(x) = \sum_{y_i \le x} \varphi(y_i).

Integrable (continuous) density function φ:R[0,+)\varphi : \mathbb{R} \to [0, +\infty): Φ(x)  =  xφ(t)dt,+φ(t)dt=1.\Phi(x) \;=\; \int_{-\infty}^{x} \varphi(t)\,dt, \qquad \int_{-\infty}^{+\infty} \varphi(t)\,dt = 1. This is the PDF. For continuous Φ\Phi with a continuous density φ\varphi, Φ(x)=φ(x)\Phi'(x) = \varphi(x) (by Barrow–Torricelli; this is Marinacci's prop 2042).

Carrier [a,b][a, b] of Φ\Phi: an interval such that Φ(x)=0\Phi(x) = 0 for all xax \le a and Φ(x)=1\Phi(x) = 1 for all xbx \ge b. Equivalently, φ(x)=0\varphi(x) = 0 outside [a,b][a, b]. A random variable that admits a carrier is called essentially bounded.

2.12 Expected value EP(f)E_{\mathbb{P}}(f)

Discrete (simple) case (Sicconi lect6 p.8): EP(f)  =  ωsuppPf(ω)P({ω})  =  i=1nyiφ(yi),E_{\mathbb{P}}(f) \;=\; \sum_{\omega \in \operatorname{supp} \mathbb{P}} f(\omega) \cdot \mathbb{P}(\{\omega\}) \;=\; \sum_{i=1}^n y_i \cdot \varphi(y_i), where the last sum is over the distinct values yiImfy_i \in \operatorname{Im} f.

Stieltjes-integral formulation (Marinacci prop 2043): if [a,b][a,b] is a carrier for Φ\Phi, EP(f)  =  abxdΦ(x).E_{\mathbb{P}}(f) \;=\; \int_a^b x\, d\Phi(x). When Φ\Phi has a continuous density φ\varphi, this simplifies to EP(f)=abxφ(x)dxE_{\mathbb{P}}(f) = \int_a^b x\, \varphi(x)\, dx (via dΦ=Φ(x)dx=φ(x)dxd\Phi = \Phi'(x)\,dx = \varphi(x)\,dx).

2.13 Variance, standard deviation, covariance, correlation

Let f,gf, g be random variables on (Ω,P)(\Omega, \mathbb{P}) with finite expected values.

Variance: VP(f)  =  EP ⁣[(fEP(f))2]  =  ωsuppP(f(ω)EP(f))2P({ω})    0.V_{\mathbb{P}}(f) \;=\; E_{\mathbb{P}}\!\left[(f - E_{\mathbb{P}}(f))^2\right] \;=\; \sum_{\omega \in \operatorname{supp} \mathbb{P}} \bigl(f(\omega) - E_{\mathbb{P}}(f)\bigr)^2 \cdot \mathbb{P}(\{\omega\}) \;\ge\; 0. Computation formula (Theorem 2024): VP(f)=EP(f2)[EP(f)]2V_{\mathbb{P}}(f) = E_{\mathbb{P}}(f^2) - [E_{\mathbb{P}}(f)]^2.

Standard deviation: σP(f)=VP(f)0\sigma_{\mathbb{P}}(f) = \sqrt{V_{\mathbb{P}}(f)} \ge 0.

Covariance: CovP(f,g)  =  EP ⁣[(fEP(f))(gEP(g))].\operatorname{Cov}_{\mathbb{P}}(f, g) \;=\; E_{\mathbb{P}}\!\left[(f - E_{\mathbb{P}}(f))(g - E_{\mathbb{P}}(g))\right]. Computation formula (Theorem 2026): CovP(f,g)=EP(fg)EP(f)EP(g)\operatorname{Cov}_{\mathbb{P}}(f, g) = E_{\mathbb{P}}(f g) - E_{\mathbb{P}}(f) E_{\mathbb{P}}(g). Note Cov(f,f)=V(f)\operatorname{Cov}(f, f) = V(f).

Linear correlation coefficient (defined when σ(f),σ(g)>0\sigma(f), \sigma(g) > 0): ρP(f,g)  =  CovP(f,g)σP(f)σP(g).\rho_{\mathbb{P}}(f, g) \;=\; \frac{\operatorname{Cov}_{\mathbb{P}}(f, g)}{\sigma_{\mathbb{P}}(f) \cdot \sigma_{\mathbb{P}}(g)}. The Cauchy–Schwarz bound (Theorem 2028, §3.13 below) gives Cov(f,g)σ(f)σ(g)|\operatorname{Cov}(f, g)| \le \sigma(f) \sigma(g), i.e., ρ(f,g)1|\rho(f, g)| \le 1.

2.14 A set function that is not a measure (lect1 Q1)

Let Ω={T1,,T50}\Omega = \{T_1, \ldots, T_{50}\} be 50 stocks, and ϕ(Ti)\phi(T_i) the difference between opening and closing price of stock TiT_i. Define μ(A)=TiAϕ(Ti)AΩ.\mu(A) = \sum_{T_i \in A} \phi(T_i) \qquad \forall\, A \subseteq \Omega. Then μ\mu is a set function (assigns a real number to every event) and satisfies μ()=0\mu(\emptyset) = 0 and additivity — but μ(A)\mu(A) can be negative (some stocks fell), so axiom K1 (positivity) fails. μ\mu is a set function but not a measure and hence not a probability. This distinguishes the three levels of the hierarchy.


§3. Theorems, Propositions & Proofs

Each entry lists the theorem name, its Marinacci number (where known), the source, a clean statement, and a full proof (verbatim from Probability Proofs.pdf where available).

3.1 Monotonicity property of a measure (page 1987)

Statement. Let M:2Ω[0,+)M : 2^\Omega \to [0, +\infty) be a measure. Then MM is monotone: M(A)M(B)A,BΩ such that AB.M(A) \le M(B) \qquad \forall\, A, B \subseteq \Omega \text{ such that } A \subseteq B.

Source: Probability Proofs.pdf p.1987 (handwritten).

Proof. Consider A,BΩA, B \subseteq \Omega with ABA \subseteq B, and define C=BAC = B \setminus A. Then AC=B,AC=.A \cup C = B, \qquad A \cap C = \emptyset. That is, AA and the "leftover" CC partition BB. Since every measure is finitely additive and positive, M(B)=M(AC)=M(A)+M(C)    M(A),M(B) = M(A \cup C) = M(A) + M(C) \;\ge\; M(A), using M(C)0M(C) \ge 0. \blacksquare


3.2 Relation between M(AB)M(A \cup B) and M(AB)M(A \cap B) (page 1989)

Statement. For every measure M:2Ω[0,+)M : 2^\Omega \to [0, +\infty) and all A,BΩA, B \subseteq \Omega, M(AB)+M(AB)  =  M(A)+M(B).M(A \cup B) + M(A \cap B) \;=\; M(A) + M(B).

Source: Probability Proofs.pdf p.1989.

Proof. Split AA and BB using their intersection and relative complements: A=(AB)(AB),(AB)(AB)=,A = (A \setminus B) \cup (A \cap B), \qquad (A \setminus B) \cap (A \cap B) = \emptyset, B=(BA)(AB),(BA)(AB)=.B = (B \setminus A) \cup (A \cap B), \qquad (B \setminus A) \cap (A \cap B) = \emptyset. Since MM is (finitely) additive, M(A)=M(AB)+M(AB),M(B)=M(BA)+M(AB).M(A) = M(A \setminus B) + M(A \cap B), \qquad M(B) = M(B \setminus A) + M(A \cap B). Moreover AB=(AB)(AB)(BA)A \cup B = (A \setminus B) \cup (A \cap B) \cup (B \setminus A) with the three pieces pairwise disjoint, so by finite additivity M(AB)=M(AB)+M(AB)+M(BA).M(A \cup B) = M(A \setminus B) + M(A \cap B) + M(B \setminus A). Adding M(AB)M(A \cap B) to both sides: M(AB)+M(AB)=M(AB)+M(AB)+M(BA)+M(AB)=M(A)+M(B).M(A \cup B) + M(A \cap B) = M(A \setminus B) + M(A \cap B) + M(B \setminus A) + M(A \cap B) = M(A) + M(B). \qquad \blacksquare

Corollary (inclusion–exclusion for a probability). For P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1] and all A,BΩA, B \subseteq \Omega, P(AB)  =  P(A)+P(B)P(AB).\mathbb{P}(A \cup B) \;=\; \mathbb{P}(A) + \mathbb{P}(B) - \mathbb{P}(A \cap B).


3.3 Probability of the complement (page 1883)

Statement. Let P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1] be a probability. Then for every AΩA \subseteq \Omega, P(Ac)  =  1P(A).\mathbb{P}(A^c) \;=\; 1 - \mathbb{P}(A).

Source: Probability Proofs.pdf p.1883.

Proof. Since AAc=A \cap A^c = \emptyset, by the additivity property (K3), P(AAc)=P(A)+P(Ac).\mathbb{P}(A \cup A^c) = \mathbb{P}(A) + \mathbb{P}(A^c). But AAc=ΩA \cup A^c = \Omega and P(Ω)=1\mathbb{P}(\Omega) = 1 (K2, normalisation), so 1=P(A)+P(Ac)P(Ac)=1P(A).1 = \mathbb{P}(A) + \mathbb{P}(A^c) \qquad \Longrightarrow \qquad \mathbb{P}(A^c) = 1 - \mathbb{P}(A). \qquad \blacksquare

Consequence. P()=P(Ωc)=1P(Ω)=11=0\mathbb{P}(\emptyset) = \mathbb{P}(\Omega^c) = 1 - \mathbb{P}(\Omega) = 1 - 1 = 0.

Union bound (Boole's inequality). For any A,BA, B, P(AB)P(A)+P(B)\mathbb{P}(A \cup B) \le \mathbb{P}(A) + \mathbb{P}(B) (drop the non-negative P(AB)\mathbb{P}(A \cap B) from §3.2). More generally P(iAi)iP(Ai)\mathbb{P}(\bigcup_i A_i) \le \sum_i \mathbb{P}(A_i).


3.4 Property of a simple probability #1 (page 1999)

Statement. Let P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1] be a simple probability. If EΩE \subseteq \Omega is a finite event with P(E)=1\mathbb{P}(E) = 1, then for every ωE\omega \notin E we have P({ω})=0\mathbb{P}(\{\omega\}) = 0.

Source: Probability Proofs.pdf p.1999.

Proof. Let ωE\omega \notin E; then ωEc\omega \in E^c. By §3.3, P(Ec)=1P(E)=11=0\mathbb{P}(E^c) = 1 - \mathbb{P}(E) = 1 - 1 = 0. By monotonicity (§3.1) applied to {ω}Ec\{\omega\} \subseteq E^c, 0P({ω})P(Ec)=0,0 \le \mathbb{P}(\{\omega\}) \le \mathbb{P}(E^c) = 0, forcing P({ω})=0\mathbb{P}(\{\omega\}) = 0. \blacksquare


3.5 Property of a simple probability #2 (page 2000)

Statement. Let P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1] be a simple probability. Then: (1) suppP\operatorname{supp} \mathbb{P} is a finite event with P(suppP)=1\mathbb{P}(\operatorname{supp} \mathbb{P}) = 1. (2) For every AΩA \subseteq \Omega that is a finite event with P(A)=1\mathbb{P}(A) = 1, suppPA\operatorname{supp} \mathbb{P} \subseteq A.

Source: Probability Proofs.pdf p.2000.

Proof of (1). Since P\mathbb{P} is simple, there exists a finite event EE with P(E)=1\mathbb{P}(E) = 1. By §3.4, P({ω})=0\mathbb{P}(\{\omega\}) = 0 for every ωE\omega \notin E. Hence every state with P({ω})>0\mathbb{P}(\{\omega\}) > 0 lies in EE, i.e., suppPE\operatorname{supp} \mathbb{P} \subseteq E. Since EE is finite, so is suppP\operatorname{supp} \mathbb{P}.

Consider the disjoint decomposition E=suppP(EsuppP)E = \operatorname{supp} \mathbb{P} \cup (E \setminus \operatorname{supp} \mathbb{P}). By additivity, P(E)=P(suppP)+P(EsuppP).\mathbb{P}(E) = \mathbb{P}(\operatorname{supp} \mathbb{P}) + \mathbb{P}(E \setminus \operatorname{supp} \mathbb{P}). For every ωEsuppP\omega \in E \setminus \operatorname{supp} \mathbb{P}, ωsuppP\omega \notin \operatorname{supp} \mathbb{P} so P({ω})=0\mathbb{P}(\{\omega\}) = 0; by finite additivity over the finite set EsuppPE \setminus \operatorname{supp} \mathbb{P}, P(EsuppP)=0\mathbb{P}(E \setminus \operatorname{supp} \mathbb{P}) = 0. Hence P(suppP)=P(E)=1.\mathbb{P}(\operatorname{supp} \mathbb{P}) = \mathbb{P}(E) = 1.

Proof of (2) — by contradiction. Suppose there exists ωsuppP\omega \in \operatorname{supp} \mathbb{P} with ωA\omega \notin A. By definition of the support, P({ω})>0\mathbb{P}(\{\omega\}) > 0. Set B=A{ω}B = A \cup \{\omega\}; then A{ω}=A \cap \{\omega\} = \emptyset, so by additivity P(B)=P(A)+P({ω})=1+P({ω})>1.\mathbb{P}(B) = \mathbb{P}(A) + \mathbb{P}(\{\omega\}) = 1 + \mathbb{P}(\{\omega\}) > 1. This contradicts P(B)1\mathbb{P}(B) \le 1. Hence suppPA\operatorname{supp} \mathbb{P} \subseteq A. \blacksquare


3.6 Property of a simple probability #3 (page 2001)

Statement. Let P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1] be a simple probability. Then for every AΩA \subseteq \Omega, P(A)  =  ωAsuppPP({ω}).\mathbb{P}(A) \;=\; \sum_{\omega \in A \cap \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}).

Source: Probability Proofs.pdf p.2001.

Proof. For any AΩA \subseteq \Omega, decompose A=(AsuppP)(A(suppP)c),(AsuppP)(A(suppP)c)=.A = (A \cap \operatorname{supp} \mathbb{P}) \cup (A \cap (\operatorname{supp} \mathbb{P})^c), \qquad (A \cap \operatorname{supp} \mathbb{P}) \cap (A \cap (\operatorname{supp} \mathbb{P})^c) = \emptyset. By additivity, P(A)=P(AsuppP)+P(A(suppP)c).\mathbb{P}(A) = \mathbb{P}(A \cap \operatorname{supp} \mathbb{P}) + \mathbb{P}(A \cap (\operatorname{supp} \mathbb{P})^c). Since A(suppP)c(suppP)cA \cap (\operatorname{supp} \mathbb{P})^c \subseteq (\operatorname{supp} \mathbb{P})^c and P((suppP)c)=11=0\mathbb{P}((\operatorname{supp} \mathbb{P})^c) = 1 - 1 = 0, by monotonicity P(A(suppP)c)=0\mathbb{P}(A \cap (\operatorname{supp} \mathbb{P})^c) = 0. Hence P(A)=P(AsuppP)=P ⁣(ωAsuppP{ω})=ωAsuppPP({ω}),\mathbb{P}(A) = \mathbb{P}(A \cap \operatorname{supp} \mathbb{P}) = \mathbb{P}\!\left(\bigcup_{\omega \in A \cap \operatorname{supp} \mathbb{P}} \{\omega\}\right) = \sum_{\omega \in A \cap \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}), using finite additivity on the finite union. \blacksquare


3.7 Theorem 2002 — Simple probabilities are convex linear combinations of Dirac probabilities

Statement. Let P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1] be a simple probability. Then for every AΩA \subseteq \Omega, P(A)  =  ωsuppPP({ω})δω(A).\mathbb{P}(A) \;=\; \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_{\omega}(A).

Source: Probability Proofs.pdf p.2002.

Proof. By §3.6, P(A)=ωAsuppPP({ω})\mathbb{P}(A) = \sum_{\omega \in A \cap \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}). Rewrite the indicator via the Dirac probability: ωAsuppPP({ω})=ωsuppPP({ω})1[ωA]=ωsuppPP({ω})δω(A).\sum_{\omega \in A \cap \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \mathbf{1}_{[\omega \in A]} = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_{\omega}(A). The indicator 1[ωA]\mathbf{1}_{[\omega \in A]} is exactly δω(A)\delta_\omega(A): it equals 11 if ωA\omega \in A and 00 otherwise. \blacksquare

Why this matters. The theorem says every simple probability P\mathbb{P} is a convex combination of point-masses: the weights P({ω})\mathbb{P}(\{\omega\}) sum to 1 and are non-negative, and the "atoms" δω\delta_\omega are probabilities. This is the structural fact behind proposition 2006 below (to prove P\mathbb{P} is countably additive, it suffices to prove each δω\delta_\omega is).


3.8 Proposition 2006 — Every Dirac probability is countably additive ★ May 2024 Q10c ★

Statement. Let ω0Ω\omega_0 \in \Omega and δω0:2Ω[0,1]\delta_{\omega_0} : 2^\Omega \to [0, 1] be the Dirac probability concentrated at ω0\omega_0. Then δω0\delta_{\omega_0} is countably additive.

Source: Probability Proofs.pdf p.2006 (proposition 2006, "first part of the proof" as cited in May 2024 Q10).

Proof — by the monotone-sequence characterisation of countable additivity. We use the equivalence stated in §2.7: P\mathbb{P} is countably additive iff for every decreasing sequence AnAA_n \downarrow A (i.e., A1A2A_1 \supseteq A_2 \supseteq \cdots and A=n=1+AnA = \bigcap_{n=1}^{+\infty} A_n), limnP(An)=P(A)\lim_{n\to\infty} \mathbb{P}(A_n) = \mathbb{P}(A).

Consider an arbitrary decreasing collection {An}n=1+\{A_n\}_{n=1}^{+\infty} of events with AnA=n=1+AnA_n \downarrow A = \bigcap_{n=1}^{+\infty} A_n. We split into two cases on whether ω0\omega_0 belongs to the limit set AA.

Case (I): ω0A=n=1+An\omega_0 \in A = \bigcap_{n=1}^{+\infty} A_n. Then ω0An\omega_0 \in A_n for every n1n \ge 1, hence δω0(An)=1\delta_{\omega_0}(A_n) = 1 for every nn. Also δω0(A)=1\delta_{\omega_0}(A) = 1 (since ω0A\omega_0 \in A). Therefore limn+δω0(An)=limn+1=1=δω0(A).\lim_{n \to +\infty} \delta_{\omega_0}(A_n) = \lim_{n \to +\infty} 1 = 1 = \delta_{\omega_0}(A). \qquad \checkmark

Case (II): ω0A=n=1+An\omega_0 \notin A = \bigcap_{n=1}^{+\infty} A_n. Since ω0\omega_0 is not in the intersection, ω0Anˉ\omega_0 \notin A_{\bar n} for some nˉN\bar n \in \mathbb{N}. Because the sequence is decreasing (AnˉAnˉ+1A_{\bar n} \supseteq A_{\bar n + 1} \supseteq \cdots), for every nnˉn \ge \bar n, AnAnˉA_n \subseteq A_{\bar n}, so ω0An\omega_0 \notin A_n. Hence δω0(An)=0\delta_{\omega_0}(A_n) = 0 for all nnˉn \ge \bar n. Also δω0(A)=0\delta_{\omega_0}(A) = 0 (since ω0A\omega_0 \notin A). Therefore limn+δω0(An)=limn+0=0=δω0(A).\lim_{n \to +\infty} \delta_{\omega_0}(A_n) = \lim_{n \to +\infty} 0 = 0 = \delta_{\omega_0}(A). \qquad \checkmark

Since limnδω0(An)=δω0(A)\lim_{n\to\infty} \delta_{\omega_0}(A_n) = \delta_{\omega_0}(A) in both cases, by the monotone-sequence criterion δω0\delta_{\omega_0} is countably additive. \blacksquare

Remark (extension to simple probabilities — "part 2" of the proof). Let P\mathbb{P} be a simple probability. By Theorem 2002 (§3.7), P(A)=ωsuppPP({ω})δω(A)\mathbb{P}(A) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_\omega(A). For any decreasing collection AnAA_n \downarrow A, limnP(An)=limnωsuppPP({ω})δω(An)=ωsuppPP({ω})limnδω(An)=ωsuppPP({ω})δω(A)=P(A),\lim_{n\to\infty} \mathbb{P}(A_n) = \lim_{n\to\infty} \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_\omega(A_n) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \lim_{n\to\infty}\delta_\omega(A_n) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} \mathbb{P}(\{\omega\}) \cdot \delta_\omega(A) = \mathbb{P}(A), where the swap of lim\lim and \sum is legal because suppP\operatorname{supp} \mathbb{P} is finite. Hence every simple probability is countably additive. (May 2024 Q10(c) accepts the shorter "first part" — just the Dirac case.)


3.9 Proposition 2009 — The uniform probability on N\mathbb{N} is not countably additive

Statement. There is no countably additive probability P\mathbb{P} on Ω=N\Omega = \mathbb{N} for which every singleton has the same probability (a "uniform probability on N\mathbb{N}").

Source: Probability Proofs.pdf p.2009.

Proof — by contradiction. Suppose such a uniform countably additive P\mathbb{P} exists, and set P({n})=k\mathbb{P}(\{n\}) = k for every nNn \in \mathbb{N}, with k0k \ge 0. Since N=nN{n}\mathbb{N} = \bigcup_{n \in \mathbb{N}} \{n\} is a disjoint union and P\mathbb{P} is countably additive, 1=P(N)=P ⁣(nN{n})=n=1+P({n})=n=1+k={0k=0+k>0.1 = \mathbb{P}(\mathbb{N}) = \mathbb{P}\!\left(\bigcup_{n \in \mathbb{N}} \{n\}\right) = \sum_{n=1}^{+\infty} \mathbb{P}(\{n\}) = \sum_{n=1}^{+\infty} k = \begin{cases} 0 & k = 0 \\ +\infty & k > 0.\end{cases} In both cases we reach a contradiction (1=01 = 0 or 1=+1 = +\infty). Therefore no countably additive uniform probability on N\mathbb{N} exists. \blacksquare

Consequence. Simple probability ↔ countable additivity is not equivalent. Poisson and Geometric distributions (defined on Ω=N\Omega = \mathbb{N}) are not simple but are countably additive (lect2).


3.10 Page 2014 — Random variables equal P\mathbb{P}-a.e.: characterization

Two random variables f,g:ΩRf, g : \Omega \to \mathbb{R} are equal P\mathbb{P}-almost everywhere (equal P\mathbb{P}-a.e.) if P({ωΩ:f(ω)=g(ω)})=1\mathbb{P}(\{\omega \in \Omega : f(\omega) = g(\omega)\}) = 1.

Statement. Let P\mathbb{P} be a simple probability. Then f,gf, g are equal P\mathbb{P}-a.e. if and only if f(ω)=g(ω)f(\omega) = g(\omega) for every ωsuppP\omega \in \operatorname{supp} \mathbb{P}.

Source: Probability Proofs.pdf p.2014.

Proof. (\Leftarrow) Suppose f(ω)=g(ω)f(\omega) = g(\omega) for every ωsuppP\omega \in \operatorname{supp} \mathbb{P}. Then suppP{ω:f(ω)=g(ω)}\operatorname{supp} \mathbb{P} \subseteq \{\omega : f(\omega) = g(\omega)\}. Since P(suppP)=1\mathbb{P}(\operatorname{supp} \mathbb{P}) = 1, by monotonicity (§3.1) P({ω:f(ω)=g(ω)})=1\mathbb{P}(\{\omega : f(\omega) = g(\omega)\}) = 1.

(\Rightarrow) Suppose P({ω:f(ω)=g(ω)})=1\mathbb{P}(\{\omega : f(\omega) = g(\omega)\}) = 1. By §3.5(2) applied with A={ω:f(ω)=g(ω)}A = \{\omega : f(\omega) = g(\omega)\}, suppPA\operatorname{supp} \mathbb{P} \subseteq A, i.e., f(ω)=g(ω)f(\omega) = g(\omega) for every ωsuppP\omega \in \operatorname{supp} \mathbb{P}. \blacksquare


3.11 Page 2017 — Random variables equal P\mathbb{P}-a.e. have the same expected value

Statement. Let P\mathbb{P} be a simple probability and f,g:ΩRf, g : \Omega \to \mathbb{R} random variables equal P\mathbb{P}-a.e. Then EP(f)=EP(g)E_{\mathbb{P}}(f) = E_{\mathbb{P}}(g).

Source: Probability Proofs.pdf p.2017.

Proof. By definition EP(f)=ωsuppPf(ω)P({ω})E_{\mathbb{P}}(f) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} f(\omega) \cdot \mathbb{P}(\{\omega\}) and similarly for gg. By §3.10, f(ω)=g(ω)f(\omega) = g(\omega) for every ωsuppP\omega \in \operatorname{supp} \mathbb{P}, so the two sums coincide termwise: EP(f)=ωsuppPf(ω)P({ω})=ωsuppPg(ω)P({ω})=EP(g).E_{\mathbb{P}}(f) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} f(\omega)\cdot \mathbb{P}(\{\omega\}) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} g(\omega)\cdot \mathbb{P}(\{\omega\}) = E_{\mathbb{P}}(g). \qquad \blacksquare

Caveat (lect6 p.9). The converse is false: EP(f)=EP(g)E_\mathbb{P}(f) = E_\mathbb{P}(g) does not imply f=gf = g P\mathbb{P}-a.e. Example: on a fair die, f(ω)=+10f(\omega) = +10 for even, 10-10 for odd; g(ω)=10g(\omega) = -10 for even, +10+10 for odd. Both have expected value 0 but they disagree on every outcome.


3.12 Page 2018 — Expected value properties (linearity, monotonicity, extension)

Statement. Let P\mathbb{P} be a simple probability and f,g:ΩRf, g : \Omega \to \mathbb{R} random variables. (1) Linearity. For all α,βR\alpha, \beta \in \mathbb{R}, EP(αf+βg)=αEP(f)+βEP(g)E_{\mathbb{P}}(\alpha f + \beta g) = \alpha E_{\mathbb{P}}(f) + \beta E_{\mathbb{P}}(g). (2) Monotonicity. If f(ω)g(ω)f(\omega) \ge g(\omega) for every ωΩ\omega \in \Omega, then EP(f)EP(g)E_{\mathbb{P}}(f) \ge E_{\mathbb{P}}(g). (3) Extension to finite sets. For every finite AΩA \subseteq \Omega with AsuppPA \supseteq \operatorname{supp} \mathbb{P}, EP(f)=ωAf(ω)P({ω}).E_{\mathbb{P}}(f) = \sum_{\omega \in A} f(\omega) \cdot \mathbb{P}(\{\omega\}).

Source: Probability Proofs.pdf p.2018.

Proof of (1) — Linearity. EP(αf+βg)=ωsuppP(αf(ω)+βg(ω))P({ω})=αωf(ω)P({ω})+βωg(ω)P({ω})=αEP(f)+βEP(g).E_{\mathbb{P}}(\alpha f + \beta g) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} (\alpha f(\omega) + \beta g(\omega)) \cdot \mathbb{P}(\{\omega\}) = \alpha \sum_{\omega} f(\omega) \mathbb{P}(\{\omega\}) + \beta \sum_{\omega} g(\omega) \mathbb{P}(\{\omega\}) = \alpha E_{\mathbb{P}}(f) + \beta E_{\mathbb{P}}(g).

Proof of (2) — Monotonicity. Assume f(ω)g(ω)f(\omega) \ge g(\omega) for every ω\omega. Multiplying by P({ω})0\mathbb{P}(\{\omega\}) \ge 0 preserves the inequality: f(ω)P({ω})g(ω)P({ω})f(\omega)\mathbb{P}(\{\omega\}) \ge g(\omega)\mathbb{P}(\{\omega\}). Summing over ωsuppP\omega \in \operatorname{supp} \mathbb{P} gives EP(f)EP(g)E_{\mathbb{P}}(f) \ge E_{\mathbb{P}}(g).

Proof of (3) — Extension. States ωΩ\omega \in \Omega with ωsuppP\omega \notin \operatorname{supp} \mathbb{P} have P({ω})=0\mathbb{P}(\{\omega\}) = 0, so their contribution to the sum is zero: ωAf(ω)P({ω})=ωsuppPf(ω)P({ω})+ωAsuppPf(ω)0=0=EP(f).\sum_{\omega \in A} f(\omega) \mathbb{P}(\{\omega\}) = \sum_{\omega \in \operatorname{supp} \mathbb{P}} f(\omega) \mathbb{P}(\{\omega\}) + \underbrace{\sum_{\omega \in A \setminus \operatorname{supp} \mathbb{P}} f(\omega) \cdot 0}_{= 0} = E_{\mathbb{P}}(f). \qquad \blacksquare

Corollary — Expected value of an affine function. EP(αf+β)=αEP(f)+βE_{\mathbb{P}}(\alpha f + \beta) = \alpha E_{\mathbb{P}}(f) + \beta for all α,βR\alpha, \beta \in \mathbb{R}. Proof: apply (1) with g1g \equiv 1 (note EP(1)=ωP({ω})=1E_{\mathbb{P}}(1) = \sum_\omega \mathbb{P}(\{\omega\}) = 1).


3.13 Page 2024 — Variance properties (computation formula, affine functions)

Statement. Let P\mathbb{P} be a simple probability and f:ΩRf : \Omega \to \mathbb{R} a random variable. For all α,βR\alpha, \beta \in \mathbb{R}: (1) Computation formula. VP(f)=EP(f2)[EP(f)]2V_{\mathbb{P}}(f) = E_{\mathbb{P}}(f^2) - [E_{\mathbb{P}}(f)]^2. (2) Affine functions. VP(αf+β)=α2VP(f)V_{\mathbb{P}}(\alpha f + \beta) = \alpha^2 V_{\mathbb{P}}(f).

Source: Probability Proofs.pdf p.2024.

Proof of (1). Expand (fEP(f))2=f22fEP(f)+[EP(f)]2(f - E_{\mathbb{P}}(f))^2 = f^2 - 2 f \, E_{\mathbb{P}}(f) + [E_{\mathbb{P}}(f)]^2. By linearity of expectation (§3.12.1), VP(f)=EP[(fEP(f))2]=EP(f2)2EP(f)EP(f)+[EP(f)]2=EP(f2)[EP(f)]2.V_{\mathbb{P}}(f) = E_{\mathbb{P}}\bigl[(f - E_{\mathbb{P}}(f))^2\bigr] = E_{\mathbb{P}}(f^2) - 2 E_{\mathbb{P}}(f) \cdot E_{\mathbb{P}}(f) + [E_{\mathbb{P}}(f)]^2 = E_{\mathbb{P}}(f^2) - [E_{\mathbb{P}}(f)]^2. (We used EP(EP(f))=EP(f)E_{\mathbb{P}}(E_{\mathbb{P}}(f)) = E_{\mathbb{P}}(f) because EP(f)RE_{\mathbb{P}}(f) \in \mathbb{R} is a constant.)

Proof of (2). By definition and linearity (§3.12), VP(αf+β)=EP[(αf+βEP(αf+β))2]=EP[(αf+βαEP(f)β)2]=EP[α2(fEP(f))2]=α2VP(f).V_{\mathbb{P}}(\alpha f + \beta) = E_{\mathbb{P}}\bigl[(\alpha f + \beta - E_{\mathbb{P}}(\alpha f + \beta))^2\bigr] = E_{\mathbb{P}}\bigl[(\alpha f + \beta - \alpha E_{\mathbb{P}}(f) - \beta)^2\bigr] = E_{\mathbb{P}}\bigl[\alpha^2 (f - E_{\mathbb{P}}(f))^2\bigr] = \alpha^2 V_{\mathbb{P}}(f). \qquad \blacksquare

Observation. The constant β\beta disappears because shifting a random variable by a constant does not change its spread; the coefficient α\alpha is squared because variance has units of f2f^2.


3.14 Page 2026 — Covariance properties (computation formula, bilinearity) ★ May 2024 MCQ5 ★

Statement. Let P\mathbb{P} be a simple probability and f,g:ΩRf, g : \Omega \to \mathbb{R} random variables. For all α,β,γ,δR\alpha, \beta, \gamma, \delta \in \mathbb{R}: (1) Computation formula. CovP(f,g)=EP(fg)EP(f)EP(g)\operatorname{Cov}_{\mathbb{P}}(f, g) = E_{\mathbb{P}}(f g) - E_{\mathbb{P}}(f) \, E_{\mathbb{P}}(g). (2) Bilinearity for affine functions. CovP(αf+β,γg+δ)=αγCovP(f,g)\operatorname{Cov}_{\mathbb{P}}(\alpha f + \beta, \gamma g + \delta) = \alpha \gamma \, \operatorname{Cov}_{\mathbb{P}}(f, g).

Source: Probability Proofs.pdf p.2026.

Proof of (1). Expand (fEP(f))(gEP(g))=fgfEP(g)gEP(f)+EP(f)EP(g).(f - E_{\mathbb{P}}(f))(g - E_{\mathbb{P}}(g)) = f g - f \, E_{\mathbb{P}}(g) - g \, E_{\mathbb{P}}(f) + E_{\mathbb{P}}(f) \, E_{\mathbb{P}}(g). Apply expectation and use linearity (§3.12.1): CovP(f,g)=EP(fg)EP(f)EP(g)EP(g)EP(f)+EP(f)EP(g)=EP(fg)EP(f)EP(g).\operatorname{Cov}_{\mathbb{P}}(f, g) = E_{\mathbb{P}}(f g) - E_{\mathbb{P}}(f) \, E_{\mathbb{P}}(g) - E_{\mathbb{P}}(g) \, E_{\mathbb{P}}(f) + E_{\mathbb{P}}(f) E_{\mathbb{P}}(g) = E_{\mathbb{P}}(f g) - E_{\mathbb{P}}(f) \, E_{\mathbb{P}}(g).

Proof of (2). First note that for every ω\omega, (αf(ω)+βEP(αf+β))(γg(ω)+δEP(γg+δ))=(αf(ω)αEP(f))(γg(ω)γEP(g))=αγ(f(ω)EP(f))(g(ω)EP(g)).\bigl(\alpha f(\omega) + \beta - E_{\mathbb{P}}(\alpha f + \beta)\bigr)\bigl(\gamma g(\omega) + \delta - E_{\mathbb{P}}(\gamma g + \delta)\bigr) = \bigl(\alpha f(\omega) - \alpha E_{\mathbb{P}}(f)\bigr)\bigl(\gamma g(\omega) - \gamma E_{\mathbb{P}}(g)\bigr) = \alpha \gamma (f(\omega) - E_{\mathbb{P}}(f))(g(\omega) - E_{\mathbb{P}}(g)). The additive constants β,δ\beta, \delta cancel because EP(αf+β)=αEP(f)+βE_{\mathbb{P}}(\alpha f + \beta) = \alpha E_{\mathbb{P}}(f) + \beta. Taking expectation and factoring αγ\alpha\gamma out by linearity, CovP(αf+β,γg+δ)=αγEP[(fEP(f))(gEP(g))]=αγCovP(f,g).\operatorname{Cov}_{\mathbb{P}}(\alpha f + \beta, \gamma g + \delta) = \alpha \gamma \, E_{\mathbb{P}}\bigl[(f - E_{\mathbb{P}}(f))(g - E_{\mathbb{P}}(g))\bigr] = \alpha \gamma \, \operatorname{Cov}_{\mathbb{P}}(f, g). \qquad \blacksquare

Special cases worth memorising.

  • Cov(f,f)=V(f)\operatorname{Cov}(f, f) = V(f) (set g=fg = f, α=γ=1\alpha = \gamma = 1, β=δ=0\beta = \delta = 0).
  • Cov(αf,βg)=αβCov(f,g)\operatorname{Cov}(\alpha f, \beta g) = \alpha \beta \operatorname{Cov}(f, g) (set β=δ=0\beta = \delta = 0this is the May 2024 MCQ5 identity).
  • Additive constants do not affect covariance.
  • Symmetry: Cov(f,g)=Cov(g,f)\operatorname{Cov}(f, g) = \operatorname{Cov}(g, f) (both equal E(fg)E(f)E(g)E(fg) - E(f)E(g)).

3.15 Page 2027 — Variance of a sum of random variables

Statement. Let P\mathbb{P} be a simple probability and f,g:ΩRf, g : \Omega \to \mathbb{R} random variables. Then VP(f+g)  =  VP(f)+VP(g)+2CovP(f,g).V_{\mathbb{P}}(f + g) \;=\; V_{\mathbb{P}}(f) + V_{\mathbb{P}}(g) + 2 \operatorname{Cov}_{\mathbb{P}}(f, g).

Source: Probability Proofs.pdf p.2027.

Proof. By definition, VP(f+g)=EP[(f+gEP(f+g))2]=EP[(fEP(f)+gEP(g))2],V_{\mathbb{P}}(f + g) = E_{\mathbb{P}}\bigl[(f + g - E_{\mathbb{P}}(f + g))^2\bigr] = E_{\mathbb{P}}\bigl[(f - E_{\mathbb{P}}(f) + g - E_{\mathbb{P}}(g))^2\bigr], using linearity of expectation EP(f+g)=EP(f)+EP(g)E_{\mathbb{P}}(f+g) = E_{\mathbb{P}}(f) + E_{\mathbb{P}}(g). Expanding the square: (fEP(f)+gEP(g))2=(fEP(f))2+(gEP(g))2+2(fEP(f))(gEP(g)).(f - E_{\mathbb{P}}(f) + g - E_{\mathbb{P}}(g))^2 = (f - E_{\mathbb{P}}(f))^2 + (g - E_{\mathbb{P}}(g))^2 + 2(f - E_{\mathbb{P}}(f))(g - E_{\mathbb{P}}(g)). Apply expectation and linearity: VP(f+g)=VP(f)+VP(g)+2CovP(f,g).V_{\mathbb{P}}(f + g) = V_{\mathbb{P}}(f) + V_{\mathbb{P}}(g) + 2 \operatorname{Cov}_{\mathbb{P}}(f, g). \qquad \blacksquare

Corollary (independent f,gf, g). If f,gf, g are independent, Cov(f,g)=0\operatorname{Cov}(f, g) = 0, so V(f+g)=V(f)+V(g)V(f + g) = V(f) + V(g).

General formula for an affine combination (lect7): V(αf+βg+const)=α2V(f)+β2V(g)+2αβCov(f,g)V(\alpha f + \beta g + \text{const}) = \alpha^2 V(f) + \beta^2 V(g) + 2 \alpha \beta \operatorname{Cov}(f, g).


3.16 Page 2028 — Covariance: boundedness property (Cauchy–Schwarz)

Statement. Let P\mathbb{P} be a simple probability and f,g:ΩRf, g : \Omega \to \mathbb{R} random variables. Then CovP(f,g)    σP(f)σP(g).|\operatorname{Cov}_{\mathbb{P}}(f, g)| \;\le\; \sigma_{\mathbb{P}}(f) \cdot \sigma_{\mathbb{P}}(g).

Source: Probability Proofs.pdf p.2028.

Proof. Write suppP={ω1,,ωn}\operatorname{supp} \mathbb{P} = \{\omega_1, \ldots, \omega_n\}.

Part (I) — centred case EP(f)=EP(g)=0E_{\mathbb{P}}(f) = E_{\mathbb{P}}(g) = 0. Define xi=f(ωi)P({ωi})x_i = f(\omega_i) \sqrt{\mathbb{P}(\{\omega_i\})} and yi=g(ωi)P({ωi})y_i = g(\omega_i) \sqrt{\mathbb{P}(\{\omega_i\})} for i=1,,ni = 1, \ldots, n. These are just real numbers (elements of Rn\mathbb{R}^n). CovP(f,g)=i=1nf(ωi)g(ωi)P({ωi})=i=1nxiyi=xy.|\operatorname{Cov}_{\mathbb{P}}(f, g)| = \left|\sum_{i=1}^n f(\omega_i) g(\omega_i) \mathbb{P}(\{\omega_i\})\right| = \left|\sum_{i=1}^n x_i y_i\right| = |\mathbf{x} \cdot \mathbf{y}|. (We used the computation formula Cov(f,g)=E(fg)E(f)E(g)=E(fg)\operatorname{Cov}(f,g) = E(fg) - E(f)E(g) = E(fg) since the centering kills the product.) By the Cauchy–Schwarz inequality in Rn\mathbb{R}^n, xyxy,|\mathbf{x} \cdot \mathbf{y}| \le \|\mathbf{x}\| \cdot \|\mathbf{y}\|, and x=i=1nxi2=i=1nf2(ωi)P({ωi})=EP(f2)=VP(f)=σP(f),\|\mathbf{x}\| = \sqrt{\sum_{i=1}^n x_i^2} = \sqrt{\sum_{i=1}^n f^2(\omega_i) \mathbb{P}(\{\omega_i\})} = \sqrt{E_{\mathbb{P}}(f^2)} = \sqrt{V_{\mathbb{P}}(f)} = \sigma_{\mathbb{P}}(f), similarly y=σP(g)\|\mathbf{y}\| = \sigma_{\mathbb{P}}(g). Thus CovP(f,g)σP(f)σP(g)|\operatorname{Cov}_{\mathbb{P}}(f, g)| \le \sigma_{\mathbb{P}}(f) \sigma_{\mathbb{P}}(g).

Part (II) — general case. Define the centred variables f~=fEP(f)\tilde f = f - E_{\mathbb{P}}(f) and g~=gEP(g)\tilde g = g - E_{\mathbb{P}}(g). Then EP(f~)=EP(g~)=0E_{\mathbb{P}}(\tilde f) = E_{\mathbb{P}}(\tilde g) = 0, so Part (I) applies: CovP(f~,g~)σP(f~)σP(g~).|\operatorname{Cov}_{\mathbb{P}}(\tilde f, \tilde g)| \le \sigma_{\mathbb{P}}(\tilde f) \cdot \sigma_{\mathbb{P}}(\tilde g). By the affine-function rules for variance and covariance (§3.13.2, §3.14.2): σP(f~)=σP(fEP(f))=σP(f),σP(g~)=σP(g),CovP(f~,g~)=CovP(f,g).\sigma_{\mathbb{P}}(\tilde f) = \sigma_{\mathbb{P}}(f - E_{\mathbb{P}}(f)) = \sigma_{\mathbb{P}}(f), \qquad \sigma_{\mathbb{P}}(\tilde g) = \sigma_{\mathbb{P}}(g), \qquad \operatorname{Cov}_{\mathbb{P}}(\tilde f, \tilde g) = \operatorname{Cov}_{\mathbb{P}}(f, g). Therefore CovP(f,g)σP(f)σP(g)|\operatorname{Cov}_{\mathbb{P}}(f, g)| \le \sigma_{\mathbb{P}}(f) \sigma_{\mathbb{P}}(g). \blacksquare

Consequence. ρP(f,g)=CovP(f,g)/(σP(f)σP(g))1|\rho_{\mathbb{P}}(f, g)| = |\operatorname{Cov}_{\mathbb{P}}(f, g)| / (\sigma_{\mathbb{P}}(f) \sigma_{\mathbb{P}}(g)) \le 1, with equality iff gg is an affine function of ff (or vice versa).


3.17 Distribution function is increasing (page 2032)

Statement. Let f:ΩRf : \Omega \to \mathbb{R} be a random variable and Φ(x)=P(fx)\Phi(x) = \mathbb{P}(f \le x) its distribution function. Then Φ:R[0,1]\Phi : \mathbb{R} \to [0, 1] is (weakly) increasing.

Source: Probability Proofs.pdf p.2032.

Proof. For xyx \le y, {fx}{fy}\{f \le x\} \subseteq \{f \le y\}. By monotonicity of P\mathbb{P} (§3.1 applied to the probability measure), Φ(x)=P(fx)P(fy)=Φ(y).\Phi(x) = \mathbb{P}(f \le x) \le \mathbb{P}(f \le y) = \Phi(y). \qquad \blacksquare


3.18 Distribution function for an essentially bounded random variable is eventually constant (page 2037)

Statement. If f:ΩRf : \Omega \to \mathbb{R} is essentially bounded (i.e., there exist m,MRm, M \in \mathbb{R} with P(mfM)=1\mathbb{P}(m \le f \le M) = 1), then there exist scalars a,ba, b such that Φ(x)=0\Phi(x) = 0 for xax \le a and Φ(x)=1\Phi(x) = 1 for xbx \ge b.

Source: Probability Proofs.pdf p.2037.

Proof. Pick m,MRm, M \in \mathbb{R} with P(mfM)=1\mathbb{P}(m \le f \le M) = 1.

For x<mx < m: {fx}{mfM}=\{f \le x\} \cap \{m \le f \le M\} = \emptyset, so by disjointness {fx}{mfM}c\{f \le x\} \subseteq \{m \le f \le M\}^c, a set of probability 00. By monotonicity, Φ(x)=P(fx)=0\Phi(x) = \mathbb{P}(f \le x) = 0.

For xMx \ge M: {mfM}{fx}\{m \le f \le M\} \subseteq \{f \le x\}, so by monotonicity Φ(x)=P(fx)P(mfM)=1\Phi(x) = \mathbb{P}(f \le x) \ge \mathbb{P}(m \le f \le M) = 1. Combined with Φ(x)1\Phi(x) \le 1, we get Φ(x)=1\Phi(x) = 1.

Setting a<ma < m and b=Mb = M yields the conclusion. \blacksquare

Terminology. Any interval [a,b][a, b] such that Φ(x)=0\Phi(x) = 0 for xax \le a and Φ(x)=1\Phi(x) = 1 for xbx \ge b is called a carrier of the distribution function.


3.19 Properties of the distribution function for a continuous density (pages 2041, 2042)

Statement (2041 — density vanishes outside the carrier). Let Φ(x)=xφ(t)dt\Phi(x) = \int_{-\infty}^x \varphi(t)\,dt be a distribution function with integrable density φ\varphi and carrier [a,b][a, b]. If φ\varphi is continuous outside [a,b][a, b], then φ(x)=0\varphi(x) = 0 for every x[a,b]x \notin [a, b].

Proof. Since [a,b][a, b] is a carrier, +φ(x)dx=abφ(x)dx=1\int_{-\infty}^{+\infty} \varphi(x)\,dx = \int_a^b \varphi(x)\,dx = 1. Fix z1,z2>bz_1, z_2 > b with z1<z2z_1 < z_2; by the definition of the carrier, z1z2φ(x)dx=0\int_{z_1}^{z_2} \varphi(x)\,dx = 0. Since φ\varphi is continuous (and non-negative) on [z1,z2][z_1, z_2], the vanishing integral forces φ(x)=0\varphi(x) = 0 on [z1,z2][z_1, z_2]. This holds for all such z1,z2z_1, z_2, hence φ(x)=0\varphi(x) = 0 on (b,+)(b, +\infty). The symmetric argument on (,a)(-\infty, a) completes the proof. \blacksquare

Statement (2042 — Barrow–Torricelli link). Let Φ\Phi be a distribution function with carrier [a,b][a, b]. Then Φ\Phi has a unique continuous density φ\varphi on [a,b][a, b] iff Φ\Phi is continuously differentiable on [a,b][a, b], in which case Φ(x)=φ(x)\Phi'(x) = \varphi(x).

Proof. Apply the Barrow–Torricelli theorem (prop 2030 from Integral Calculus) on [a,b][a, b] with Φ=g\Phi = g and φ=γ\varphi = \gamma. \blacksquare

(These technical lemmas rarely appear on their own on the exam, but they underlie the continuous distributions listed in §3.24.)


3.20 Expected value of a random variable w.r.t. a simple probability as a Stieltjes integral (page 2043)

Statement. Let P\mathbb{P} be a simple probability and f:ΩRf : \Omega \to \mathbb{R} a random variable with distribution function Φ\Phi having carrier [a,b][a, b]. Then EP(f)  =  abxdΦ(x).E_{\mathbb{P}}(f) \;=\; \int_a^b x\, d\Phi(x).

Source: Probability Proofs.pdf p.2043.

Proof. Write suppP={ω1,,ωn}\operatorname{supp} \mathbb{P} = \{\omega_1, \ldots, \omega_n\} and set xi=f(ωi)x_i = f(\omega_i). Assume (WLOG) the xix_i are distinct and ordered x1<x2<<xnx_1 < x_2 < \cdots < x_n. Since [a,b][a, b] is a carrier, a<x1a < x_1 and bxnb \ge x_n. EP(f)=i=1nf(ωi)P({ωi})=i=1nxiP(f=xi).E_{\mathbb{P}}(f) = \sum_{i=1}^n f(\omega_i) \mathbb{P}(\{\omega_i\}) = \sum_{i=1}^n x_i \, \mathbb{P}(f = x_i). By countable additivity of P\mathbb{P}, the jump size of Φ\Phi at xix_i is Φ(xi)limxxiΦ(x)=P(f=xi)\Phi(x_i) - \lim_{x \to x_i^-} \Phi(x) = \mathbb{P}(f = x_i). Hence EP(f)=i=1nxi[Φ(xi)limxxiΦ(x)].E_{\mathbb{P}}(f) = \sum_{i=1}^n x_i \left[\Phi(x_i) - \lim_{x \to x_i^-} \Phi(x)\right]. Since Φ\Phi is increasing and right-continuous on [a,b][a, b] (§3.17 and standard CDF properties), by the theorem on the writing of the Stieltjes integral with step-function integrator, EP(f)=abxdΦ(x).E_{\mathbb{P}}(f) = \int_a^b x\, d\Phi(x). \qquad \blacksquare


3.21 Bayes' theorem and the law of total probability

Bayes' theorem (stated in lect3). Let A,BΩA, B \subseteq \Omega with P(A),P(B)>0\mathbb{P}(A), \mathbb{P}(B) > 0. Then P(AB)  =  P(BA)P(A)P(B).\mathbb{P}(A \mid B) \;=\; \frac{\mathbb{P}(B \mid A) \cdot \mathbb{P}(A)}{\mathbb{P}(B)}.

Proof. By the definition of conditional probability, P(AB)=P(AB)P(B)andP(BA)=P(AB)P(A).\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)} \qquad \text{and} \qquad \mathbb{P}(B \mid A) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(A)}. Multiply the second equation by P(A)\mathbb{P}(A) to get P(AB)=P(BA)P(A)\mathbb{P}(A \cap B) = \mathbb{P}(B \mid A) \mathbb{P}(A), and substitute into the first. \blacksquare

Law of total probability. Let {Ei}i=1n\{E_i\}_{i=1}^n be a partition of Ω\Omega (pairwise disjoint, union equal to Ω\Omega) with P(Ei)>0\mathbb{P}(E_i) > 0 for every ii. Then for every AΩA \subseteq \Omega, P(A)  =  i=1nP(AEi)P(Ei).\mathbb{P}(A) \;=\; \sum_{i=1}^n \mathbb{P}(A \mid E_i) \cdot \mathbb{P}(E_i).

Proof. A=AΩ=Ai=1nEi=i=1n(AEi)A = A \cap \Omega = A \cap \bigcup_{i=1}^n E_i = \bigcup_{i=1}^n (A \cap E_i), a disjoint union. By finite additivity, P(A)=i=1nP(AEi)=i=1nP(AEi)P(Ei).\mathbb{P}(A) = \sum_{i=1}^n \mathbb{P}(A \cap E_i) = \sum_{i=1}^n \mathbb{P}(A \mid E_i) \mathbb{P}(E_i). \qquad \blacksquare

Combined Bayes / total-probability formula. If {Ei}\{E_i\} is a partition and P(A)>0\mathbb{P}(A) > 0, P(EkA)=P(AEk)P(Ek)i=1nP(AEi)P(Ei).\mathbb{P}(E_k \mid A) = \frac{\mathbb{P}(A \mid E_k) \mathbb{P}(E_k)}{\sum_{i=1}^n \mathbb{P}(A \mid E_i) \mathbb{P}(E_i)}.


3.22 Markov's and Chebyshev's inequalities

Markov's inequality (lect5). For any random variable X0X \ge 0 and any a>0a > 0, P(Xa)    E(X)a.\mathbb{P}(X \ge a) \;\le\; \frac{E(X)}{a}.

Proof. Since X0X \ge 0, Xa1{Xa}X \ge a \cdot \mathbf{1}_{\{X \ge a\}} pointwise (if X(ω)aX(\omega) \ge a, then X(ω)a1X(\omega) \ge a \cdot 1; if X(ω)<aX(\omega) < a, the RHS is 0). Taking expectations (monotonicity, §3.12.2): E(X)aE(1{Xa})=aP(Xa).E(X) \ge a \cdot E(\mathbf{1}_{\{X \ge a\}}) = a \cdot \mathbb{P}(X \ge a). \qquad \blacksquare

Chebyshev's inequality. Let XX be a random variable with finite mean μ=E(X)\mu = E(X) and variance σ2=V(X)\sigma^2 = V(X). For every k>0k > 0, P(Xμkσ)    1k2.\mathbb{P}(|X - \mu| \ge k \sigma) \;\le\; \frac{1}{k^2}. Equivalently, P(Xμa)σ2/a2\mathbb{P}(|X - \mu| \ge a) \le \sigma^2 / a^2 for every a>0a > 0.

Proof. Apply Markov's inequality to Y=(Xμ)20Y = (X - \mu)^2 \ge 0 with threshold a=(kσ)2a = (k \sigma)^2: P(Xμkσ)=P(Y(kσ)2)E(Y)(kσ)2=V(X)k2σ2=1k2.\mathbb{P}(|X - \mu| \ge k\sigma) = \mathbb{P}(Y \ge (k\sigma)^2) \le \frac{E(Y)}{(k\sigma)^2} = \frac{V(X)}{k^2 \sigma^2} = \frac{1}{k^2}. \qquad \blacksquare

Interpretation. Chebyshev says at least 11/k21 - 1/k^2 of the probability mass is within kk standard deviations of the mean. E.g., k=2k = 2 gives P(Xμ<2σ)3/4\mathbb{P}(|X - \mu| < 2\sigma) \ge 3/4.


3.23 Named discrete distributions (lect2, lect6)

For each distribution we list Ω\Omega, the PMF pnp_n, the expected value E(f)E(f) (where ff is the identity random variable, f=nf = n), and the variance V(f)V(f).

| Distribution | Ω\Omega | PMF pnp_n | E(f)E(f) | V(f)V(f) | Notes | |---|---|---|---|---|---| | Bernoulli(pp) | {0,1}\{0, 1\} | p1=pp_1 = p, p0=1pp_0 = 1-p | pp | p(1p)p(1-p) | Single 0/1 trial | | Binomial(n,pn, p) | {0,1,,n}\{0, 1, \ldots, n\} | pk=(nk)pk(1p)nkp_k = \binom{n}{k} p^k (1-p)^{n-k} | npn p | np(1p)n p (1-p) | Sum of nn i.i.d. Bernoulli(pp) | | Geometric(qq) | N={0,1,2,}\mathbb{N} = \{0, 1, 2, \ldots\} | pn=qn(1q)p_n = q^n (1-q), 0<q<10 < q < 1 | q/(1q)q/(1-q) | q/(1q)2q/(1-q)^2 | Countable; not simple, countably additive | | Poisson(λ\lambda) | N\mathbb{N} | pn=eλλn/n!p_n = e^{-\lambda} \lambda^n / n!, λ>0\lambda > 0 | λ\lambda | λ\lambda | Countable; not simple, countably additive | | Uniform discrete(nn) | {1,,n}\{1, \ldots, n\} | pi=1/np_i = 1/n | (n+1)/2(n+1)/2 | (n+1)(n1)/12(n+1)(n-1)/12 | Fair die: Ω={1,,6}\Omega = \{1,\ldots,6\}, E=3.5E=3.5, V=35/12V=35/12 |

Key identity (Poisson). P(X=n)=eλλn/n!P(X = n) = e^{-\lambda} \lambda^n / n!; in particular the probabilities sum to n=0+eλλn/n!=eλeλ=1\sum_{n=0}^{+\infty} e^{-\lambda} \lambda^n / n! = e^{-\lambda} \cdot e^\lambda = 1 (using the series expansion of eλe^\lambda).

Worked Poisson (lect2 p.4): for λ=2\lambda = 2, P(X=0)=e2,P(X=1)=2e2,P(X=2)=2e2,P(X=3)=43e2,P(X = 0) = e^{-2}, \qquad P(X = 1) = 2 e^{-2}, \qquad P(X = 2) = 2 e^{-2}, \qquad P(X = 3) = \frac{4}{3} e^{-2}, \ldots This distribution is used in the May 2024 MCQ4 (see §4.1).


3.24 Named continuous distributions (lect7)

For each distribution we list the PDF φ(x)\varphi(x), the CDF Φ(x)\Phi(x), the carrier (if any), and the expected value E(f)E(f) (for f=identityf = \text{identity}).

| Distribution | PDF φ(x)\varphi(x) | CDF Φ(x)\Phi(x) | Carrier | E(f)E(f) | V(f)V(f) | |---|---|---|---|---|---| | Uniform[a,b][a,b] | 1ba1[a,b](x)\dfrac{1}{b-a}\cdot \mathbf{1}_{[a,b]}(x) | {0x<axabaaxb1x>b\begin{cases}0 & x < a \\ \frac{x-a}{b-a} & a \le x \le b \\ 1 & x > b\end{cases} | [a,b][a, b] | (a+b)/2(a+b)/2 | (ba)2/12(b-a)^2/12 | | (Negative) exponential(α\alpha), α>0\alpha > 0 | {0x<0αeαxx0\begin{cases}0 & x < 0 \\ \alpha e^{-\alpha x} & x \ge 0\end{cases} | {0x<01eαxx0\begin{cases}0 & x < 0 \\ 1 - e^{-\alpha x} & x \ge 0\end{cases} | none (unbounded) | 1/α1/\alpha | 1/α21/\alpha^2 | | Standard normal / Gaussian | 12πex2/2\dfrac{1}{\sqrt{2\pi}} e^{-x^2/2} | no closed form (Φ(0)=1/2\Phi(0) = 1/2) | none (unbounded) | 00 | 11 |

Proof that E(f)=1/αE(f) = 1/\alpha for exponential (lect7, reproduced from the source). EP(f)=+xdΦ(x)=0+xαeαxdx.E_{\mathbb{P}}(f) = \int_{-\infty}^{+\infty} x \, d\Phi(x) = \int_0^{+\infty} x \cdot \alpha e^{-\alpha x} \, dx. Integration by parts with u=αxu = \alpha x, u=αu' = \alpha, v=eαxv' = e^{-\alpha x}, v=1αeαxv = -\frac{1}{\alpha} e^{-\alpha x}: αxeαxdx=xeαx+eαxdx=xeαx1αeαx+C=eαx(x+1α)+C.\int \alpha x \cdot e^{-\alpha x} dx = -x e^{-\alpha x} + \int e^{-\alpha x} dx = -x e^{-\alpha x} - \frac{1}{\alpha} e^{-\alpha x} + C = -e^{-\alpha x}\left(x + \frac{1}{\alpha}\right) + C. Evaluating from 0 to ++\infty and using the hierarchy of infinities limx+x/eαx=0\lim_{x\to+\infty} x / e^{\alpha x} = 0: EP(f)=limK+[eαK(K+1α)+e01α]=0+1α=1α.E_{\mathbb{P}}(f) = \lim_{K\to+\infty} \left[-e^{-\alpha K}\left(K + \frac{1}{\alpha}\right) + e^0 \cdot \frac{1}{\alpha}\right] = 0 + \frac{1}{\alpha} = \frac{1}{\alpha}. \qquad \blacksquare

Proof that E(f)=0E(f) = 0 for standard normal. The PDF φ(x)=12πex2/2\varphi(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2} is symmetric about 00, so E(f)=+xφ(x)dx=0E(f) = \int_{-\infty}^{+\infty} x \varphi(x)\, dx = 0 (integrand is odd and absolutely integrable). Alternatively, a direct computation: EP(f)=12π+xex2/2dx=12π[limK+(ex2/2)K0+limK+(ex2/2)0K]=12π[1+1]=0.E_{\mathbb{P}}(f) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{+\infty} x e^{-x^2/2} dx = \frac{1}{\sqrt{2\pi}}\left[\lim_{K\to+\infty}\bigl(-e^{-x^2/2}\bigr)\Big|_{-K}^0 + \lim_{K\to+\infty}\bigl(-e^{-x^2/2}\bigr)\Big|_0^K\right] = \frac{1}{\sqrt{2\pi}}[-1 + 1] = 0. \quad \blacksquare


§4. Worked Examples

Example 4.1 — Poisson P(n3)\mathbb{P}(n \ge 3) for λ=2\lambda = 2 ★ May 2024 MCQ4 ★

Problem. Consider Ω=N\Omega = \mathbb{N} and the Poisson probability P\mathbb{P} with parameter λ=2\lambda = 2. Compute P(E)\mathbb{P}(E) for E={nN:n3}E = \{n \in \mathbb{N} : n \ge 3\}.

Solution. The Poisson PMF with λ=2\lambda = 2 is pn=e22n/n!p_n = e^{-2} \cdot 2^n / n!. Rather than summing an infinite tail, use the complement trick: P(n3)=1P(n2)=1n=02e22nn!=1e2(200!+211!+222!)=1e2(1+2+2)=15e2.\mathbb{P}(n \ge 3) = 1 - \mathbb{P}(n \le 2) = 1 - \sum_{n=0}^{2} e^{-2} \frac{2^n}{n!} = 1 - e^{-2}\left(\frac{2^0}{0!} + \frac{2^1}{1!} + \frac{2^2}{2!}\right) = 1 - e^{-2}(1 + 2 + 2) = \boxed{1 - 5 e^{-2}}.

Common trap. Students sometimes compute P(n>3)\mathbb{P}(n > 3) and use P(n3)\mathbb{P}(n \le 3); since the problem asks for n3n \ge 3 (inclusive), one must use P(n2)\mathbb{P}(n \le 2) in the complement. A second trap is forgetting 0!=10! = 1.

Source: General_24524_ENG_SOL.pdf Mode A MCQ4.


Example 4.2 — Back-solve σP(f)\sigma_{\mathbb{P}}(f) from covariance and correlation ★ May 2024 MCQ5 ★

Problem. Let f,gf, g be two random variables on (Ω,P)(\Omega, \mathbb{P}) with VP(g)=100V_{\mathbb{P}}(g) = 100, ρP(f,g)=0.1\rho_{\mathbb{P}}(f, g) = -0.1, and CovP(2f,5g)=90|\operatorname{Cov}_{\mathbb{P}}(2f, 5g)| = 90. Compute σP(f)\sigma_{\mathbb{P}}(f).

Solution. First, by covariance bilinearity (§3.14.2): CovP(2f,5g)=25CovP(f,g)=10CovP(f,g),\operatorname{Cov}_{\mathbb{P}}(2f, 5g) = 2 \cdot 5 \cdot \operatorname{Cov}_{\mathbb{P}}(f, g) = 10 \operatorname{Cov}_{\mathbb{P}}(f, g), so CovP(f,g)=10Cov/10=90/10=9|\operatorname{Cov}_{\mathbb{P}}(f, g)| = |10 \operatorname{Cov}|/10 = 90/10 = 9.

Second, σP(g)=VP(g)=100=10\sigma_{\mathbb{P}}(g) = \sqrt{V_{\mathbb{P}}(g)} = \sqrt{100} = 10.

Third, recall the definition of the correlation coefficient: ρP(f,g)=CovP(f,g)σP(f)σP(g)=0.1.\rho_{\mathbb{P}}(f, g) = \frac{\operatorname{Cov}_{\mathbb{P}}(f, g)}{\sigma_{\mathbb{P}}(f) \sigma_{\mathbb{P}}(g)} = -0.1. Since ρ<0\rho < 0, the sign of Cov(f,g)\operatorname{Cov}(f, g) must be negative: CovP(f,g)=9\operatorname{Cov}_{\mathbb{P}}(f, g) = -9. Substituting: 0.1=9σP(f)10σP(f)=90.110=9.-0.1 = \frac{-9}{\sigma_{\mathbb{P}}(f) \cdot 10} \qquad \Longrightarrow \qquad \sigma_{\mathbb{P}}(f) = \frac{-9}{-0.1 \cdot 10} = \boxed{9}.

Common trap. Forgetting to square α\alpha inside VV (e.g., computing V(2f)=2V(f)V(2f) = 2 V(f) instead of 4V(f)4 V(f)) — that is the variance rule, not the covariance rule. Another trap: forgetting to extract the sign from ρ\rho and ending up with σ(f)=±9\sigma(f) = \pm 9.

Source: General_24524_ENG_SOL.pdf Mode A MCQ5.


Example 4.3 — Bayes' theorem (classic disease test)

Problem. A diagnostic test is 99% sensitive (true-positive rate) and 99% specific (true-negative rate). The disease affects 1 in 1000 people. A randomly chosen person tests positive. What is the probability that they actually have the disease?

Solution. Let DD = "has disease", T+T^+ = "tests positive". Given: P(D)=0.001\mathbb{P}(D) = 0.001, P(T+D)=0.99\mathbb{P}(T^+ \mid D) = 0.99, P(T+Dc)=10.99=0.01\mathbb{P}(T^+ \mid D^c) = 1 - 0.99 = 0.01. Compute P(DT+)\mathbb{P}(D \mid T^+) using Bayes: P(DT+)=P(T+D)P(D)P(T+)=0.990.0010.990.001+0.010.999=0.000990.00099+0.00999=0.000990.010980.0902.\mathbb{P}(D \mid T^+) = \frac{\mathbb{P}(T^+ \mid D) \mathbb{P}(D)}{\mathbb{P}(T^+)} = \frac{0.99 \cdot 0.001}{0.99 \cdot 0.001 + 0.01 \cdot 0.999} = \frac{0.00099}{0.00099 + 0.00999} = \frac{0.00099}{0.01098} \approx 0.0902.

Observation. Despite the test being "99% accurate", the posterior probability of having the disease given a positive test is only 9%\sim 9\% — because the disease is rare, false positives dominate true positives. This is the classic base-rate fallacy.


Example 4.4 — EE and VV of a custom discrete random variable

Problem. Roll a (fair) die. Define f(ω)=+10f(\omega) = +10 if ω\omega is even, 10-10 if ω\omega is odd. Compute E(f)E(f), V(f)V(f), σ(f)\sigma(f).

Solution. EP(f)=10P({2,4,6})10P({1,3,5})=10121012=0.E_{\mathbb{P}}(f) = 10 \cdot \mathbb{P}(\{2, 4, 6\}) - 10 \cdot \mathbb{P}(\{1, 3, 5\}) = 10 \cdot \frac{1}{2} - 10 \cdot \frac{1}{2} = 0.

VP(f)=EP(f2)[EP(f)]2=(102)12+(10)21202=100.V_{\mathbb{P}}(f) = E_{\mathbb{P}}(f^2) - [E_{\mathbb{P}}(f)]^2 = (10^2) \cdot \frac{1}{2} + (-10)^2 \cdot \frac{1}{2} - 0^2 = 100.

σP(f)=100=10.\sigma_{\mathbb{P}}(f) = \sqrt{100} = 10.

Extension. Compute VP(63f)V_{\mathbb{P}}(6 - 3f) and σP(63f)\sigma_{\mathbb{P}}(6 - 3f) using the affine-function rule: V(63f)=(3)2V(f)=9100=900,σ(63f)=3σ(f)=310=30.V(6 - 3f) = (-3)^2 V(f) = 9 \cdot 100 = 900, \qquad \sigma(6 - 3f) = |-3| \cdot \sigma(f) = 3 \cdot 10 = 30.

Source: lect6 p.8.


Example 4.5 — Dirac additivity proof (the May 2024 Q10 "template")

Problem. Verify directly (no monotone-sequence trick) that δω0(E1E2)=δω0(E1)+δω0(E2)\delta_{\omega_0}(E_1 \cup E_2) = \delta_{\omega_0}(E_1) + \delta_{\omega_0}(E_2) for any disjoint events E1,E2E_1, E_2 — i.e., check that δω0\delta_{\omega_0} satisfies Kolmogorov's axiom K3.

Solution — by cases.

Case 1: ω0E1E2\omega_0 \notin E_1 \cup E_2. Then ω0E1\omega_0 \notin E_1 and ω0E2\omega_0 \notin E_2, so δω0(E1)=δω0(E2)=0\delta_{\omega_0}(E_1) = \delta_{\omega_0}(E_2) = 0 and δω0(E1E2)=0=0+0\delta_{\omega_0}(E_1 \cup E_2) = 0 = 0 + 0. \checkmark

Case 2: ω0E1E2\omega_0 \in E_1 \cup E_2. Since E1E2=E_1 \cap E_2 = \emptyset, ω0\omega_0 belongs to exactly one of E1,E2E_1, E_2 — say ω0E1\omega_0 \in E_1 (the other case is symmetric). Then δω0(E1)=1\delta_{\omega_0}(E_1) = 1, δω0(E2)=0\delta_{\omega_0}(E_2) = 0, δω0(E1E2)=1=1+0\delta_{\omega_0}(E_1 \cup E_2) = 1 = 1 + 0. \checkmark

In both cases K3 holds. The K1 and K2 axioms (positivity and normalisation) are immediate from the definition: δω0(E){0,1}[0,1]\delta_{\omega_0}(E) \in \{0, 1\} \subseteq [0, 1] for every EE, and δω0(Ω)=1\delta_{\omega_0}(\Omega) = 1 since ω0Ω\omega_0 \in \Omega.

Extension to countable additivity. This is exactly the two-case structure of Proposition 2006 (§3.8), adapted to a decreasing sequence. Memorise the argument — on the exam, the two-case skeleton "Case I: ω0A\omega_0 \in A / Case II: ω0A\omega_0 \notin A" is the entire proof.


Example 4.6 — Covariance from a joint PMF table

Problem (lect7 p.4, modified). Roll a fair die. Define f(ω)={1ω{1,2,3}+1ω{4,5,6},g(ω)={2ω{1,2}0ω{3,4}+2ω{5,6}.f(\omega) = \begin{cases} -1 & \omega \in \{1, 2, 3\} \\ +1 & \omega \in \{4, 5, 6\} \end{cases}, \qquad g(\omega) = \begin{cases} -2 & \omega \in \{1, 2\} \\ 0 & \omega \in \{3, 4\} \\ +2 & \omega \in \{5, 6\} \end{cases}. Compute Cov(f,g)\operatorname{Cov}(f, g) and ρ(f,g)\rho(f, g).

Solution. E(f)=(1)12+112=0.E(f) = (-1) \cdot \frac{1}{2} + 1 \cdot \frac{1}{2} = 0. E(g)=(2)13+013+213=0.E(g) = (-2) \cdot \frac{1}{3} + 0 \cdot \frac{1}{3} + 2 \cdot \frac{1}{3} = 0. E(fg)E(fg): evaluate fgfg at each of the 6 outcomes:

| ω\omega | f(ω)f(\omega) | g(ω)g(\omega) | f(ω)g(ω)f(\omega) g(\omega) | P({ω})=1/6\mathbb{P}(\{\omega\}) = 1/6 | |---|---|---|---|---| | 1 | 1-1 | 2-2 | 22 | 1/61/6 | | 2 | 1-1 | 2-2 | 22 | 1/61/6 | | 3 | 1-1 | 00 | 00 | 1/61/6 | | 4 | +1+1 | 00 | 00 | 1/61/6 | | 5 | +1+1 | +2+2 | 22 | 1/61/6 | | 6 | +1+1 | +2+2 | 22 | 1/61/6 |

E(fg)=(2+2+0+0+2+2)16=86=43E(fg) = (2 + 2 + 0 + 0 + 2 + 2) \cdot \frac{1}{6} = \frac{8}{6} = \frac{4}{3}. Cov(f,g)=E(fg)E(f)E(g)=430=43\operatorname{Cov}(f, g) = E(fg) - E(f) E(g) = \frac{4}{3} - 0 = \frac{4}{3}.

V(f)=E(f2)0=1,V(g)=(2)2/3+0+(+2)2/3=8/3V(f) = E(f^2) - 0 = 1, V(g) = (-2)^2/3 + 0 + (+2)^2/3 = 8/3. So σ(f)=1,σ(g)=8/3=22/3\sigma(f) = 1, \sigma(g) = \sqrt{8/3} = 2\sqrt{2/3}. ρ(f,g)=4/3122/3=4/322/3=232/3=26=630.816.\rho(f, g) = \frac{4/3}{1 \cdot 2\sqrt{2/3}} = \frac{4/3}{2\sqrt{2/3}} = \frac{2}{3\sqrt{2/3}} = \frac{2}{\sqrt{6}} = \frac{\sqrt{6}}{3} \approx 0.816. A strong positive linear relationship, as expected (both ff and gg increase with ω\omega).


Example 4.7 — Independence check

Problem. Roll a fair die. Let A={even}={2,4,6}A = \{\text{even}\} = \{2, 4, 6\} and B={3}={1,2,3}B = \{\le 3\} = \{1, 2, 3\}. Are AA and BB independent?

Solution. P(A)=1/2\mathbb{P}(A) = 1/2, P(B)=1/2\mathbb{P}(B) = 1/2, P(AB)=P({2})=1/6\mathbb{P}(A \cap B) = \mathbb{P}(\{2\}) = 1/6. Check: P(A)P(B)=1/41/6=P(AB)\mathbb{P}(A)\mathbb{P}(B) = 1/4 \neq 1/6 = \mathbb{P}(A \cap B). NOT independent.

Verify via conditional probability: P(AB)=P(AB)/P(B)=(1/6)/(1/2)=1/31/2=P(A)\mathbb{P}(A \mid B) = \mathbb{P}(A \cap B)/\mathbb{P}(B) = (1/6)/(1/2) = 1/3 \neq 1/2 = \mathbb{P}(A) — knowing the outcome is 3\le 3 decreases the chance of evenness from 50% to 33%.

Contrast. Let A={even}A = \{\text{even}\} and B={1,2}B' = \{1, 2\}. P(AB)=P({2})=1/6\mathbb{P}(A \cap B') = \mathbb{P}(\{2\}) = 1/6; P(A)P(B)=(1/2)(1/3)=1/6\mathbb{P}(A)\mathbb{P}(B') = (1/2)(1/3) = 1/6. Independent.


§5. Solution Methods

Each method is a named algorithm (input → steps → output → pitfalls). Cross-references in the right column point to the May 2024 exam problems where the method is applied.


M-P-1 — Compute a probability via Kolmogorov rules

Used on: general MCQs, TA exercises.

Input. A probability P\mathbb{P} and an event AA (possibly built from unions, intersections, complements, or conditional formulas).

Steps.

  1. Rewrite AA in a canonical form using set algebra (complements, finite unions/intersections).
  2. Apply the Kolmogorov tools one at a time:
    • Complement: P(Ac)=1P(A)\mathbb{P}(A^c) = 1 - \mathbb{P}(A).
    • Inclusion–exclusion: P(AB)=P(A)+P(B)P(AB)\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) - \mathbb{P}(A \cap B).
    • Finite additivity (disjoint): P(A1An)=P(Ai)\mathbb{P}(A_1 \cup \cdots \cup A_n) = \sum \mathbb{P}(A_i) if pairwise disjoint.
    • Monotonicity: if ABA \subseteq B, P(A)P(B)\mathbb{P}(A) \le \mathbb{P}(B).
    • Union bound: P(Ai)P(Ai)\mathbb{P}(\bigcup A_i) \le \sum \mathbb{P}(A_i) (always, no disjointness needed).
  3. Simplify.

Output. A numerical value.

Pitfalls.

  • Forgetting to subtract P(AB)\mathbb{P}(A \cap B) when applying inclusion–exclusion to non-disjoint events.
  • Using finite additivity on non-disjoint events (wrong answer).
  • Neglecting the normalization P(Ω)=1\mathbb{P}(\Omega) = 1 when computing P(Ac)\mathbb{P}(A^c) (forgetting the "1").

M-P-2 — Apply Bayes' theorem

Used on: disease-test, gambler's problems.

Input. Prior probabilities P(Ei)\mathbb{P}(E_i) for a partition {Ei}\{E_i\}, and likelihoods P(AEi)\mathbb{P}(A \mid E_i) for an observed event AA.

Steps.

  1. Identify the partition {Ei}\{E_i\} and compute P(A)=iP(AEi)P(Ei)\mathbb{P}(A) = \sum_i \mathbb{P}(A \mid E_i) \mathbb{P}(E_i) via the law of total probability.
  2. Apply Bayes: P(EkA)=P(AEk)P(Ek)/P(A)\mathbb{P}(E_k \mid A) = \mathbb{P}(A \mid E_k) \mathbb{P}(E_k) / \mathbb{P}(A).

Output. The posterior probability P(EkA)\mathbb{P}(E_k \mid A).

Pitfalls.

  • Confusing P(EA)\mathbb{P}(E \mid A) with P(AE)\mathbb{P}(A \mid E) (the "confusion of the inverse").
  • Forgetting the prior factor P(E)\mathbb{P}(E) — students sometimes conclude P(EA)=P(AE)\mathbb{P}(E \mid A) = \mathbb{P}(A \mid E).

M-P-3 — P(Xk)\mathbb{P}(X \ge k) for discrete: use 1P(Xk1)1 - \mathbb{P}(X \le k-1)

Used on: May 2024 MCQ4 (Poisson tail).

Input. A discrete random variable XX and a threshold kk.

Steps.

  1. Rewrite {Xk}={Xk1}c\{X \ge k\} = \{X \le k-1\}^c.
  2. Compute P(Xk1)=i=0k1P(X=i)\mathbb{P}(X \le k-1) = \sum_{i=0}^{k-1} \mathbb{P}(X = i) — a finite sum.
  3. Subtract from 1: P(Xk)=1P(Xk1)\mathbb{P}(X \ge k) = 1 - \mathbb{P}(X \le k-1).

Output. A numerical expression.

Pitfalls.

  • Using the strict-inequality complement {X>k}={Xk}c\{X > k\} = \{X \le k\}^c instead of {Xk}={Xk1}c\{X \ge k\} = \{X \le k-1\}^c. For a continuous random variable the distinction doesn't matter (since P(X=k)=0\mathbb{P}(X = k) = 0); for a discrete one, it does.
  • Forgetting 0!=10! = 1 in Poisson or mis-expanding λ0=1\lambda^0 = 1.

M-P-4 — Back-solve for σ/V\sigma / V using Cov/correlation relationships

Used on: May 2024 MCQ5 (the canonical "given ρ\rho and Cov(αf,βg)|\operatorname{Cov}(\alpha f, \beta g)|, find σ(f)\sigma(f)").

Input. Given values of V(g)V(g), ρ(f,g)\rho(f, g), and Cov(αf,βg)|\operatorname{Cov}(\alpha f, \beta g)| (or similar). Find an unknown σ(f)\sigma(f) or V(f)V(f).

Steps.

  1. Unwrap the covariance using bilinearity: Cov(αf,βg)=αβCov(f,g)\operatorname{Cov}(\alpha f, \beta g) = \alpha \beta \operatorname{Cov}(f, g).
  2. Extract Cov(f,g)|\operatorname{Cov}(f, g)| from the given Cov(αf,βg)|\operatorname{Cov}(\alpha f, \beta g)|: divide by αβ|\alpha \beta|.
  3. Use the given ρ(f,g)\rho(f, g) to recover the sign: Cov(f,g)\operatorname{Cov}(f, g) has the same sign as ρ\rho.
  4. Plug into ρ=Cov/(σ(f)σ(g))\rho = \operatorname{Cov}/(\sigma(f) \sigma(g)) and solve for σ(f)\sigma(f) (using σ(g)=V(g)\sigma(g) = \sqrt{V(g)}).

Output. σ(f)\sigma(f).

Pitfalls.

  • Confusing the variance rule (V(αf)=α2V(f)V(\alpha f) = \alpha^2 V(f)) with the covariance rule (Cov(αf,βg)=αβCov(f,g)\operatorname{Cov}(\alpha f, \beta g) = \alpha \beta \operatorname{Cov}(f, g)). Covariance is bilinear, not quadratic.
  • Forgetting the sign of the correlation when taking Cov|\operatorname{Cov}|; exam problems often give |\cdot| expressly to test this.
  • Plugging in V(g)V(g) where σ(g)\sigma(g) is needed (must take square root).

M-P-5 — Compute E(X)E(X), V(X)V(X) from a PMF or PDF

Used on: worked examples throughout (§4.4, §4.6, §4.7).

Input. The PMF φ(yi)=P(X=yi)\varphi(y_i) = \mathbb{P}(X = y_i) or PDF φ(x)\varphi(x), and the random variable ff (often f=identityf = \text{identity}).

Steps (discrete).

  1. E(f)=iyiφ(yi)E(f) = \sum_i y_i \varphi(y_i).
  2. E(f2)=iyi2φ(yi)E(f^2) = \sum_i y_i^2 \varphi(y_i).
  3. V(f)=E(f2)[E(f)]2V(f) = E(f^2) - [E(f)]^2 (computation formula — easier than the definition).
  4. σ(f)=V(f)\sigma(f) = \sqrt{V(f)}.

Steps (continuous).

  1. E(f)=abxφ(x)dxE(f) = \int_a^b x \varphi(x)\,dx (restricted to the carrier [a,b][a,b] if one exists).
  2. E(f2)=abx2φ(x)dxE(f^2) = \int_a^b x^2 \varphi(x)\,dx.
  3. V(f)=E(f2)[E(f)]2V(f) = E(f^2) - [E(f)]^2.

Output. E(f)E(f), V(f)V(f), σ(f)\sigma(f).

Pitfalls.

  • For discrete: summing over yiy_i instead of ωi\omega_i. Recall φ(yi)=P(f=yi)=ω:f(ω)=yiP({ω})\varphi(y_i) = \mathbb{P}(f = y_i) = \sum_{\omega : f(\omega) = y_i} \mathbb{P}(\{\omega\}).
  • For continuous: using the wrong bounds; recall φ(x)=0\varphi(x) = 0 outside the carrier, so +=ab\int_{-\infty}^{+\infty} = \int_a^b whenever a carrier exists.
  • Using V(f)=iyi2φ(yi)V(f) = \sum_i y_i^2 \varphi(y_i) (forgetting the subtraction of [E(f)]2[E(f)]^2).

M-P-6 — Verify countable additivity for a candidate P\mathbb{P}

Used on: May 2024 Q10 (verify Dirac is CA).

Input. A candidate probability P\mathbb{P} and the hypothesis "is P\mathbb{P} countably additive?".

Steps.

  1. Consider an arbitrary decreasing sequence AnAA_n \downarrow A (i.e., A1A2A_1 \supseteq A_2 \supseteq \cdots and A=AnA = \bigcap A_n).
  2. Compute limnP(An)\lim_{n\to\infty} \mathbb{P}(A_n).
  3. Compare to P(A)\mathbb{P}(A).
  4. If they are equal for every such sequence, P\mathbb{P} is countably additive. Otherwise, not.

Alternative (for "simple" probabilities). Invoke Theorem 2002: every simple probability is a convex combination of Diracs, each of which is CA (Prop 2006). Hence every simple probability is CA.

Alternative (for "contradiction"). Assume P\mathbb{P} is CA and derive a contradiction — e.g., uniform on N\mathbb{N} leads to 1=01 = 0 or 1=+1 = +\infty (Prop 2009).

Output. "Yes, P\mathbb{P} is countably additive" (with proof) or "No, P\mathbb{P} is not countably additive" (with counterexample).

Pitfalls.

  • Using an increasing sequence AnAA_n \uparrow A instead of decreasing, or vice versa — the two formulations are equivalent (lect2) but the Dirac proof is cleanest on a decreasing sequence (proposition 2006).
  • Forgetting that countable additivity is not automatic — it must be verified for each candidate. Poisson, geometric, and all simple probabilities are CA; uniform on N\mathbb{N} is not.

M-P-7 — Check independence of events or random variables

Used on: Example 4.7, lect3 exercises.

Input. Two events A,BA, B (or two random variables f,gf, g) and a probability P\mathbb{P}.

Steps (events).

  1. Compute P(A)\mathbb{P}(A), P(B)\mathbb{P}(B), P(AB)\mathbb{P}(A \cap B) separately.
  2. Check: P(AB)=?P(A)P(B)\mathbb{P}(A \cap B) \overset{?}{=} \mathbb{P}(A) \mathbb{P}(B).
  3. If equal, AA and BB are independent; otherwise, not.

Steps (random variables).

  1. Verify P(fx,gy)=P(fx)P(gy)\mathbb{P}(f \le x, g \le y) = \mathbb{P}(f \le x) \mathbb{P}(g \le y) for every x,yx, y (usually equivalent to checking the joint PMF factors: P(f=yi,g=zj)=P(f=yi)P(g=zj)\mathbb{P}(f = y_i, g = z_j) = \mathbb{P}(f = y_i) \mathbb{P}(g = z_j)).
  2. As a necessary condition, check Cov(f,g)=0\operatorname{Cov}(f, g) = 0 — but this is not sufficient!

Output. Yes/no with justification.

Pitfalls.

  • Zero covariance does not imply independence. Counterexample: let ff be uniform on {1,0,1}\{-1, 0, 1\} and g=f2g = f^2. Then Cov(f,g)=E(fg)E(f)E(g)=E(f3)E(f)E(f2)=00(2/3)=0\operatorname{Cov}(f, g) = E(f g) - E(f) E(g) = E(f^3) - E(f) E(f^2) = 0 - 0 \cdot (2/3) = 0, yet gg is completely determined by ff — they are obviously not independent. The equivalence Cov=0\operatorname{Cov} = 0 \Leftrightarrow independence holds only for jointly normal random variables.
  • Mistaking disjointness for independence. Disjoint events A,BA, B (with P(A),P(B)>0\mathbb{P}(A), \mathbb{P}(B) > 0) are always dependent because P(AB)=0P(A)P(B)\mathbb{P}(A \cap B) = 0 \ne \mathbb{P}(A) \mathbb{P}(B).

§6. Practice Problems with Solutions

Problem 6.1 — lect1 Q1: set function, measure, or probability?

An equity fund contains 50 stocks: Ω={T1,T2,,T50}\Omega = \{T_1, T_2, \ldots, T_{50}\}. Let ϕ:ΩR\phi : \Omega \to \mathbb{R} send each stock to the difference between its opening and closing price on a given day. Consider μ:2ΩR\mu : 2^\Omega \to \mathbb{R} with μ(A)=TiAϕ(Ti)\mu(A) = \sum_{T_i \in A} \phi(T_i). Which is correct? (A) μ\mu is a set function but not a measure; (B) μ\mu is a measure but not a probability; (C) μ\mu is a probability; (D) none of the preceding.

Solution. Check the axioms one by one.

  • Grounded: μ()=0\mu(\emptyset) = 0. ✓
  • Positivity: μ(A)\mu(A) can be negative (some stocks fell, so ϕ(Ti)<0\phi(T_i) < 0). ✗
  • Additivity: for disjoint A,BA, B, μ(AB)=TiABϕ(Ti)=TiAϕ(Ti)+TiBϕ(Ti)=μ(A)+μ(B)\mu(A \cup B) = \sum_{T_i \in A \cup B} \phi(T_i) = \sum_{T_i \in A} \phi(T_i) + \sum_{T_i \in B} \phi(T_i) = \mu(A) + \mu(B). ✓

Since positivity fails, μ\mu is not a measure (and hence not a probability). It is a set function. Answer: (A).

Remark. If ϕ\phi had been defined as the absolute value of the price difference, then μ\mu would be a measure (positivity restored). But μ(Ω)=i=150ϕ(Ti)\mu(\Omega) = \sum_{i=1}^{50} |\phi(T_i)| need not equal 1, so it still wouldn't be a probability unless normalized.

Source: lect1, Q1 (TA handout).


Problem 6.2 — May 2024 MCQ4 (Mode A): Poisson P(n3)\mathbb{P}(n \ge 3)

Consider Ω=N\Omega = \mathbb{N} and the Poisson probability P\mathbb{P} with parameter λ=2\lambda = 2. The probability of E={nN:n3}E = \{n \in \mathbb{N} : n \ge 3\} is: (A) 15e21 - 5 e^{-2}; (B) e2/2e^{-2}/2; (C) 13e21 - 3 e^{-2}; (D) none.

Solution. See §4.1. Use the complement: P(E)=1P(n2)=1e2(1+2+2)=15e2.Answer: (A).\mathbb{P}(E) = 1 - \mathbb{P}(n \le 2) = 1 - e^{-2}(1 + 2 + 2) = 1 - 5 e^{-2}. \qquad \textbf{Answer: (A)}.

Source: General_24524_ENG_SOL.pdf p.2.


Problem 6.3 — May 2024 MCQ5 (Mode A): back-solve σP(f)\sigma_{\mathbb{P}}(f)

Let f,gf, g be random variables on (Ω,P)(\Omega, \mathbb{P}) with VP(g)=100V_{\mathbb{P}}(g) = 100, ρP(f,g)=0.1\rho_{\mathbb{P}}(f, g) = -0.1, and CovP(2f,5g)=90|\operatorname{Cov}_{\mathbb{P}}(2f, 5g)| = 90. Then σP(f)=?\sigma_{\mathbb{P}}(f) = ? (A) 9; (B) 3; (C) 10; (D) none.

Solution. See §4.2. Bilinearity: Cov(2f,5g)=10Cov(f,g)Cov(f,g)=9\operatorname{Cov}(2f, 5g) = 10 \operatorname{Cov}(f, g) \Rightarrow |\operatorname{Cov}(f, g)| = 9. Sign from ρ\rho: Cov(f,g)=9\operatorname{Cov}(f, g) = -9. σ(g)=10\sigma(g) = 10. Solve 0.1=9/(10σ(f))σ(f)=9-0.1 = -9/(10 \sigma(f)) \Rightarrow \sigma(f) = 9. Answer: (A).

Source: General_24524_ENG_SOL.pdf p.2.


Problem 6.4 — May 2024 Q10 (Mode A): Dirac probability and countable additivity ★

(a) Define the Dirac probability over the state space Ω\Omega. (b) Give the definition of countable additivity for a probability P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1]. (c) Prove that any Dirac probability is countably additive.

Solution.

(a) (Marinacci example 1993, §2.5.) Fix ω0Ω\omega_0 \in \Omega. The Dirac probability concentrated at ω0\omega_0 is δω0(E)={1if ω0E0if ω0EEΩ.\delta_{\omega_0}(E) = \begin{cases} 1 & \text{if } \omega_0 \in E \\ 0 & \text{if } \omega_0 \notin E \end{cases} \qquad \forall E \subseteq \Omega. One checks easily that δω0\delta_{\omega_0} is a probability: non-negative and [0,1]\in [0, 1]; δω0(Ω)=1\delta_{\omega_0}(\Omega) = 1; additive because for disjoint E1,E2E_1, E_2, ω0\omega_0 belongs to at most one of them.

(b) (Marinacci definition 2003, §2.7.) P:2Ω[0,1]\mathbb{P} : 2^\Omega \to [0, 1] is countably additive (or σ\sigma-additive) if for every countable collection {En}n=1+\{E_n\}_{n=1}^{+\infty} of pairwise disjoint events (EiEj=E_i \cap E_j = \emptyset for iji \neq j), P ⁣(n=1+En)=n=1+P(En).\mathbb{P}\!\left(\bigcup_{n=1}^{+\infty} E_n\right) = \sum_{n=1}^{+\infty} \mathbb{P}(E_n).

(c) (Proposition 2006, §3.8 — full proof.) We use the monotone-sequence characterisation: P\mathbb{P} is CA iff for every AnAA_n \downarrow A, limnP(An)=P(A)\lim_{n\to\infty} \mathbb{P}(A_n) = \mathbb{P}(A). Take any such decreasing sequence.

Case (I): ω0A=n=1+An\omega_0 \in A = \bigcap_{n=1}^{+\infty} A_n. Then ω0An\omega_0 \in A_n for every nn, so δω0(An)=1\delta_{\omega_0}(A_n) = 1 for every nn, and δω0(A)=1\delta_{\omega_0}(A) = 1. Hence limnδω0(An)=1=δω0(A)\lim_{n\to\infty} \delta_{\omega_0}(A_n) = 1 = \delta_{\omega_0}(A).

Case (II): ω0A\omega_0 \notin A. Then there exists nˉN\bar n \in \mathbb{N} with ω0Anˉ\omega_0 \notin A_{\bar n}. Since AnAnˉA_n \subseteq A_{\bar n} for every nnˉn \ge \bar n, we have ω0An\omega_0 \notin A_n for every nnˉn \ge \bar n, i.e., δω0(An)=0\delta_{\omega_0}(A_n) = 0 eventually. Also δω0(A)=0\delta_{\omega_0}(A) = 0. Hence limnδω0(An)=0=δω0(A)\lim_{n\to\infty} \delta_{\omega_0}(A_n) = 0 = \delta_{\omega_0}(A).

Since limnδω0(An)=δω0(A)\lim_n \delta_{\omega_0}(A_n) = \delta_{\omega_0}(A) in both cases, δω0\delta_{\omega_0} is countably additive. \blacksquare

Source: General_24524_ENG_SOL.pdf p.6 (solution: "See example 1993, definition 2003, proposition 2006 (first part of the proof)").


Problem 6.5 — Probability HW: compute λ\lambda from a Poisson condition

XPoisson(λ)X \sim \operatorname{Poisson}(\lambda) with P(X<1)=e5\mathbb{P}(X < 1) = e^{-5}. Compute (i) λ\lambda, (ii) P(X>2)\mathbb{P}(X > 2), (iii) V(X+5)V(X + 5), (iv) E(X2)E(X^2).

Solution.

(i) P(X<1)=P(X=0)=eλλ0/0!=eλ\mathbb{P}(X < 1) = \mathbb{P}(X = 0) = e^{-\lambda} \cdot \lambda^0/0! = e^{-\lambda}. Setting eλ=e5e^{-\lambda} = e^{-5} gives λ=5\lambda = 5.

(ii) P(X>2)=1P(X2)=1e5(1+5+25/2)=118.5e5\mathbb{P}(X > 2) = 1 - \mathbb{P}(X \le 2) = 1 - e^{-5}(1 + 5 + 25/2) = 1 - 18.5 e^{-5}.

(iii) V(X+5)=V(X)=λ=5V(X + 5) = V(X) = \lambda = 5 (constant shift doesn't affect variance; §3.13.2 with α=1\alpha = 1).

(iv) V(X)=E(X2)[E(X)]2V(X) = E(X^2) - [E(X)]^2 so E(X2)=V(X)+[E(X)]2=5+25=30E(X^2) = V(X) + [E(X)]^2 = 5 + 25 = 30.

Source: lect2 p.5 (Probability HW).


Problem 6.6 — TA1_prob_41: simple vs. countable vs. neither

Classify each of the following probabilities on Ω\Omega: (a) Roll a fair die (Ω={1,,6}\Omega = \{1, \ldots, 6\}, P({i})=1/6\mathbb{P}(\{i\}) = 1/6). (b) Poisson(λ\lambda) on Ω=N\Omega = \mathbb{N}. (c) Uniform on Ω=N\Omega = \mathbb{N} with P({n})=k\mathbb{P}(\{n\}) = k for all nn. (d) Ω=N\Omega = \mathbb{N}, P({0})=P({1})=P({2})=1/3\mathbb{P}(\{0\}) = \mathbb{P}(\{1\}) = \mathbb{P}(\{2\}) = 1/3, P({n})=0\mathbb{P}(\{n\}) = 0 for n3n \ge 3.

Solution.

(a) Simple (finite Ω\Omega, so finite support); countably additive (finite sum always equals countable sum).

(b) Not simple (every P({n})=eλλn/n!>0\mathbb{P}(\{n\}) = e^{-\lambda}\lambda^n/n! > 0, so the support is all of N\mathbb{N}, which is infinite). Countably additive — in fact this is the key example of a CA but non-simple probability (lect2 p.3).

(c) Not countably additive (Proposition 2009, §3.9). Also not simple (support is N\mathbb{N}).

(d) Simple: the finite event E={0,1,2}E = \{0, 1, 2\} has P(E)=1\mathbb{P}(E) = 1. Countably additive (all simple probabilities are; see remark after §3.8).

Source: lect2 p.1, TA1_prob_41 p.2.


Problem 6.7 — lect7: covariance from a joint experiment

Two dice are rolled independently. Let ff = result of first die, gg = result of second. Compute E(f+g)E(f + g), V(f+g)V(f + g), Cov(f,g)\operatorname{Cov}(f, g), ρ(f,g)\rho(f, g).

Solution. By independence, Cov(f,g)=0\operatorname{Cov}(f, g) = 0 and ρ(f,g)=0\rho(f, g) = 0. E(f)=E(g)=7/2E(f) = E(g) = 7/2; E(f+g)=E(f)+E(g)=7E(f + g) = E(f) + E(g) = 7. V(f)=V(g)=35/12V(f) = V(g) = 35/12; V(f+g)=V(f)+V(g)+2Cov(f,g)=35/6+0=35/6V(f + g) = V(f) + V(g) + 2\operatorname{Cov}(f, g) = 35/6 + 0 = 35/6.


Problem 6.8 — lect6: expected value of affine function

Given E(f)=5E(f) = 5 and E(g)=6E(g) = 6, compute (i) E(2f+3g)E(2f + 3g); (ii) E(2f+7)E(2f + 7); (iii) E(3f+g4)E(-3f + g - 4).

Solution. (i) E(2f+3g)=2E(f)+3E(g)=10+18=28E(2f + 3g) = 2 E(f) + 3 E(g) = 10 + 18 = 28. (ii) E(2f+7)=2E(f)+7=17E(2f + 7) = 2 E(f) + 7 = 17. (iii) E(3f+g4)=3E(f)+E(g)4=15+64=13E(-3f + g - 4) = -3 E(f) + E(g) - 4 = -15 + 6 - 4 = -13.

Source: lect6 p.9.


§7. Common Pitfalls

  1. Dirac verification — must check both cases. When proving δω0\delta_{\omega_0} satisfies a property (additivity, CA, …), always split on "ω0\omega_0 \in \cdot / ω0\omega_0 \notin \cdot". Forgetting case (II) — i.e., assuming ω0\omega_0 is always in the union — leads to an incomplete proof (and loses points on May 2024 Q10c).

  2. Poisson arithmetic — λn/n!\lambda^n / n!, not nλn\lambda or λn/n\lambda^n / n. P(X=n)=eλλn/n!\mathbb{P}(X = n) = e^{-\lambda} \cdot \lambda^n / n!. Many students confuse λn\lambda^n and nλn^\lambda, or drop the 1/n!1/n! factor, or replace n!n! with nn. Double-check: P(X=0)=eλ\mathbb{P}(X = 0) = e^{-\lambda} (not 0), P(X=1)=λeλ\mathbb{P}(X = 1) = \lambda e^{-\lambda}, P(X=2)=λ2eλ/2\mathbb{P}(X = 2) = \lambda^2 e^{-\lambda}/2.

  3. Sign errors in correlation. When given Cov|\operatorname{Cov}|, use ρ\rho to recover the sign. A negative ρ\rho forces a negative Cov\operatorname{Cov}. Never write σ(f)=±9\sigma(f) = \pm 9 — standard deviation is always non-negative; the ± belongs to the covariance.

  4. Cov=0\operatorname{Cov} = 0 does not imply independence. Independent Cov=0\Rightarrow \operatorname{Cov} = 0, but the converse holds only for jointly normal random variables. Counterexample: ff uniform on {1,0,1}\{-1, 0, 1\}, g=f2g = f^2. Cov(f,g)=0\operatorname{Cov}(f, g) = 0 but gg is determined by ff.

  5. V(αf)=α2V(f)V(\alpha f) = \alpha^2 V(f), not αV(f)\alpha V(f). Square the coefficient. Contrast with Cov(αf,βg)=αβCov(f,g)\operatorname{Cov}(\alpha f, \beta g) = \alpha \beta \operatorname{Cov}(f, g) (bilinearity). Variance scales quadratically; covariance scales bilinearly.

  6. Additive constants don't affect variance or covariance. V(f+β)=V(f)V(f + \beta) = V(f); Cov(f+β,g+δ)=Cov(f,g)\operatorname{Cov}(f + \beta, g + \delta) = \operatorname{Cov}(f, g). Only the coefficient matters. (They do affect the expected value.)

  7. Confusing PMF φ(y)\varphi(y) and CDF Φ(x)\Phi(x). PMF = "probability at a single value" (φ(y)=P(f=y)\varphi(y) = \mathbb{P}(f = y)); CDF = "probability up to and including xx" (Φ(x)=P(fx)\Phi(x) = \mathbb{P}(f \le x)). They satisfy Φ(x)=yxφ(y)\Phi(x) = \sum_{y \le x} \varphi(y) (discrete) or Φ(x)=xφ(t)dt\Phi(x) = \int_{-\infty}^x \varphi(t)\,dt (continuous).

  8. PDF \ne P(f=x)\mathbb{P}(f = x) in the continuous case. For a continuous random variable, P(f=x)=0\mathbb{P}(f = x) = 0 for every individual xx; the density φ(x)\varphi(x) measures probability per unit length, not probability itself. Only the integral over an interval has probabilistic meaning.

  9. Simple probability ⇎\not\Leftrightarrow countably additive. Uniform on N\mathbb{N} is neither simple nor CA; Poisson is CA but not simple; rolling a fair die is both simple and CA. Memorize the 2×2 grid.

  10. Forgetting the complement trick for tails. P(Xk)\mathbb{P}(X \ge k) has an infinite tail (for Poisson, etc.); use 1P(Xk1)1 - \mathbb{P}(X \le k - 1) to get a finite sum. For a continuous random variable, \ge and >> give the same probability.

  11. Monotonicity \le not <<. ABA \subseteq B gives P(A)P(B)\mathbb{P}(A) \le \mathbb{P}(B) — the inequality is weak. Example (lect1): M({1})=0.5=M({1,2,3})M(\{1\}) = 0.5 = M(\{1, 2, 3\}) if M(3)=0M(3) = 0 on a fair die with support {1,2}\{1, 2\}. Do not assume strict inequality.

  12. Bayes' theorem: numerator must use the likelihood, not the posterior. P(EA)=P(AE)P(E)/P(A)\mathbb{P}(E \mid A) = \mathbb{P}(A \mid E) \mathbb{P}(E) / \mathbb{P}(A). Students sometimes write P(EA)=P(EA)P(E)/P(A)\mathbb{P}(E \mid A) = \mathbb{P}(E \mid A) \mathbb{P}(E) / \mathbb{P}(A), a circular error.

  13. Distinguishing random variables from events. Events are subsets of Ω\Omega (AΩA \subseteq \Omega); random variables are functions f:ΩRf : \Omega \to \mathbb{R}. P(A)\mathbb{P}(A) is a number; E(f)E(f) is a number; ff itself is a function.

  14. Carrier \ne support. "Carrier" is the interval [a,b][a, b] outside which Φ(x){0,1}\Phi(x) \in \{0, 1\} (used for essentially bounded random variables); "support" is the set of outcomes ωΩ\omega \in \Omega with P({ω})>0\mathbb{P}(\{\omega\}) > 0 (used for simple probabilities). Don't confuse them.

  15. Chebyshev gives an upper bound, not an exact probability. P(Xμkσ)1/k2\mathbb{P}(|X - \mu| \ge k\sigma) \le 1/k^2 is a bound — the actual probability may be much smaller. Do not use Chebyshev as an equality.


Cross-references to the rest of the study guide

  • Linear algebra (§01): the covariance matrix Σf,g=(V(f)Cov(f,g)Cov(f,g)V(g))\Sigma_{f,g} = \begin{pmatrix} V(f) & \operatorname{Cov}(f,g) \\ \operatorname{Cov}(f,g) & V(g) \end{pmatrix} is a symmetric 2×22 \times 2 matrix; the eigenvalues determine the sign of ρ\rho (Cauchy–Schwarz \Leftrightarrow positive semi-definite).
  • Integral calculus (§03): the expected value of a continuous random variable is a Stieltjes integral (§3.20); expected values of exponential and normal distributions use integration by parts (§3.24).
  • Differential calculus (§02): Markov/Chebyshev are non-differentiable bounds; the Gaussian PDF uses the exponential function studied in §02.
  • Mathematical finance (§05): the portfolio variance V(wTr)=wTΣwV(\mathbf{w}^T \mathbf{r}) = \mathbf{w}^T \Sigma \mathbf{w} is a direct application of §3.15 (variance of a sum) with weights.

End of §04 — Probability.