Comprehensive study guide for Bocconi Math Module 2 (cod. 30063), General Exam.Source materials: Alice Sicconi's Probability Proofs.pdf, Probability.pdf, lect{1..8}_prob (2).pdf, TA1_prob_41.pdf, Probability HW.pdf, General_24524_ENG_SOL.pdf.
§1. Overview & Exam Relevance
Probability is the first block of the second partial and accounts for approximately 14% of the general exam (typically 1–2 MCQs worth 5 pts each plus one open-ended question worth up to 20 pts — about 25–30 pts of the 150-pt exam). It is the topic where two May 2024 MCQs (MCQ4, MCQ5) and one flagship 20-pt theorem-statement-plus-proof question (Q10) all live simultaneously, so the marginal value of mastering the definitions cleanly is very high.
Topic scope. The exam tests:
the measure-theoretic hierarchy set function → measure → probability: grounded, positive, additive, normalized;
Kolmogorov axioms for a probability P:2Ω→[0,1] and their consequences (complement, monotonicity, inclusion–exclusion, union bound);
Dirac probabilityδω0 — the canonical "point mass" example (May 2024 Q10a);
Countable additivity (σ-additivity) as a property that may or may not hold, and the proof that every Dirac probability is countably additive (May 2024 Q10c);
Simple probabilities, support, convex-combination representation, and the equivalence between simple probabilities and convex linear combinations of Dirac probabilities;
Conditional probabilityP(A∣B), Bayes' theorem, law of total probability, independence of events and random variables;
Random variablesf:Ω→R (Sicconi uses the letter f for a random variable; Marinacci uses X), expected valueEP(f), varianceVP(f), standard deviationσP(f), covarianceCovP(f,g), linear correlation coefficientρP(f,g);
Linearity of expectation, variance and covariance properties (computation formula, affine-function rules, bilinearity), Cauchy–Schwarz bound ∣Cov(f,g)∣≤σ(f)σ(g);
Named discrete distributions: Bernoulli, Binomial, Geometric, Poisson, uniform discrete — PMFs with mean and variance;
Named continuous distributions: uniform on [a,b], (negative) exponential with rate α, Gaussian/standard normal — density functions, distribution functions, expected values;
Distribution functionΦ(x)=P(f≤x), density φ(x), carrier [a,b], essentially bounded random variables;
Markov's and Chebyshev's inequalities (absolute-value tail bounds).
Typical MCQ patterns (May 2024 General Exam).
MCQ4 (Mode A) / MCQ3 (Mode B): "Consider Ω=N and the Poisson probability P with parameter λ=2. The probability of the event E={n∈N:n≥3} is …". Technique: use the complement, P(n≥3)=1−P(n≤2)=1−e−2(20/0!+21/1!+22/2!)=1−e−2(1+2+2)=1−5e−2.
MCQ5 (Mode A) / MCQ4 (Mode B): "Let f,g be two random variables on (Ω,P) with VP(g)=100, ρP(f,g)=−0.1 and ∣CovP(2f,5g)∣=90. Then σP(f)=?". Technique: bilinearity gives Cov(2f,5g)=10Cov(f,g), so ∣Cov(f,g)∣=9, and the sign is read off from ρ=−0.1<0, i.e., Cov(f,g)=−9. From ρ=Cov/(σ(f)σ(g)) with σ(g)=10: −0.1=−9/(10σ(f)) so σ(f)=9.
Part (a) — Define the Dirac probabilityδω0 over a state space Ω.
Part (b) — Give the definition of countable additivity for a probability P:2Ω→[0,1].
Part (c) — Prove that any Dirac probability is countably additive. The official solution points to "example 1993, definition 2003, proposition 2006 (first part of the proof)" — i.e., this is a pure memorization-plus-execution question, and the candidate must reproduce the two-case proof exactly.
Why this topic is high-leverage.
Every calculation in finance involving expected return or variance is an application of the E/V/Cov machinery proved here.
The measure-theoretic framework (grounded/positive/additive) mirrors the Riemann-integral framework of §03: what "mass" is to probability, "area" is to the integral.
Q10 is one of the highest-yield 20-point questions: if you memorize proposition 2006 and know how to write definition 2003 cleanly, you can bank ≈20 points in under 12 minutes.
§2. Definitions
2.1 Sample space, state, event
The sample space (or state space) Ω is the set of all possible outcomes of an experiment.
Elements ω∈Ω are states (of the world) — individual outcomes. Subsets A⊆Ω are called events. The pair (Ω,P) is called a probability space.
Three types of sample space (lect1):
Discrete finite: ∣Ω∣=n. Example: rolling a die, Ω={1,2,3,4,5,6}.
Discrete countable: Ω is infinite but countable. Example: Ω=N={0,1,2,…}.
Continuous: Ω is uncountable. Example: Ω=[0,2]⊆R.
2.2 Power set 2Ω
The power setP(Ω)=2Ω is the collection of all subsets of Ω. For a finite sample space with ∣Ω∣=n, we have ∣2Ω∣=2n.
Every probability measure is defined on 2Ω: it assigns a real number to each event. The notation 2Ω (rather than Ω) emphasises that the domain of P is the set of events, not the set of outcomes.
Example (lect1 p.1).Ω={1,2,3}, 2Ω={∅,{1},{2},{3},{1,2},{2,3},{1,3},Ω}. There are 23=8 events; P(∅)=0 and P(Ω)=1 are hard-coded by the Kolmogorov axioms.
2.3 Set function, measure, probability (the hierarchy)
(Source: lect1_prob (2).pdf pp.1–3.)
Set functionM:2Ω→R — the least restrictive object: it simply assigns a real number to every event.
MeasureM:2Ω→[0,+∞) — a set function satisfying the three measure axioms:
Grounded: M(∅)=0.
Positive (non-negative): M(A)≥0 for every A⊆Ω.
(Finitely) additive: for every A,B⊆Ω with A∩B=∅,
M(A∪B)=M(A)+M(B).
ProbabilityP:2Ω→[0,1] — a measure that additionally satisfies the normalisation axiom:
Normalized: P(Ω)=1.
The four properties 1–4 are the Kolmogorov axioms. Every probability is a measure; every measure is a set function. The reverse inclusions fail in general (cf. Example 2.14 below).
2.4 Probability — Kolmogorov axioms
A function P:2Ω→[0,1] is a probability (or probability measure) on (Ω,2Ω) if:
(K1) Non-negativity — P(A)≥0 for every event A⊆Ω.
(K2) Normalization — P(Ω)=1.
(K3) (Finite) additivity — if A,B⊆Ω and A∩B=∅, then P(A∪B)=P(A)+P(B).
The axiom P(∅)=0 is a consequence of the other three (take A=B=∅ in K3; then P(∅)=2P(∅), so P(∅)=0).
2.5 Dirac probability δω0
Fix ω0∈Ω. The Dirac probability concentrated at ω0 is the function δω0:2Ω→[0,1] defined by
δω0(E)={10if ω0∈Eif ω0∈/E∀E⊆Ω.
This is the Marinacci example 1993. Intuition: δω0 describes a "sure outcome" — it is as if we already know the result of the experiment is ω0, so events that contain ω0 are certain and all others are impossible.
Example (lect1 p.3). Roll a die (Ω={1,2,3,4,5,6}) but you are somehow told that the outcome is 4. Consider A={2,4,6} (even outcomes). Then δ4(A)=1 because 4∈A. δ4({1})=δ4({3})=δ4({5})=0 and δ4({2})=δ4({4})=δ4({6})=1 (each singleton outcome has probability 0 or 1).
2.6 Simple probability
A probability P:2Ω→[0,1] is called simple if there exists a finite eventE⊆Ω with P(E)=1. Intuitively: a simple probability has only finitely many "realistically possible" outcomes, even if Ω itself is infinite.
The support of a simple probability is the set of outcomes with strictly positive probability:
suppP={ω∈Ω:P({ω})>0}.
Every simple probability can be written (Theorem 2002 — stated below) as a convex linear combination of Dirac probabilities:
P(A)=∑ω∈suppPP({ω})⋅δω(A)∀A⊆Ω.
2.7 Countable additivity (σ-additivity)
(Marinacci definition 2003.) A probability P:2Ω→[0,1] is countably additive (also called σ-additive) if for every countable collection {En}n=1+∞ of pairwise disjoint events (i.e., Ei∩Ej=∅ for all i=j),
P(n=1⋃+∞En)=n=1∑+∞P(En).
This is the countable version of the finite-additivity axiom (K3). It is not automatically implied by K1–K3 and must be checked case-by-case. (Counterexample: the uniform probability on Ω=N is a set function that satisfies the first three Kolmogorov axioms in spirit, but it is not countably additive — see Proposition 2009 below.)
Equivalent monotone-sequence formulation (lect2): P is countably additive ⟺ for every increasing collection An↑A (i.e., A1⊆A2⊆⋯ and A=⋃An), P(A)=limn→∞P(An); equivalently, for every decreasing collection An↓A, P(A)=limn→∞P(An).
2.8 Conditional probability
Let P be a probability on Ω and let B⊆Ω be an event with P(B)>0. The conditional probability of A given B is
P(A∣B)=P(B)P(A∩B).
For fixed B, the function A↦P(A∣B) is itself a probability on Ω (with support concentrated on B). Intuition: once you know B has happened, rescale probabilities to make B the new "certain" event.
2.9 Independence
Independent events. Two events A,B⊆Ω are independent (under P) if
P(A∩B)=P(A)⋅P(B).
Equivalently (when P(B)>0), P(A∣B)=P(A) — knowing B occurred does not change the probability of A.
Independent random variables. Random variables f,g:Ω→R are independent if the events {f≤x} and {g≤y} are independent for every x,y∈R:
P(f≤x,g≤y)=P(f≤x)⋅P(g≤y).
Independence of f and g implies E[fg]=E[f]⋅E[g], hence Cov(f,g)=0. Caveat: the converse is false in general (see §7 — "zero correlation does not imply independence").
2.10 Random variable
A random variable is any function f:Ω→R. (Sicconi's convention uses f; Marinacci sometimes uses X. In this guide we mix the two: lowercase f,g,h for random variables as in the lectures, and uppercase X,Y,Z when following the textbook verbatim — they are interchangeable.)
Intuition (lect6 p.1). A random variable is a bet that assigns a real-valued payoff to each outcome of the experiment. Example: roll a die and bet 10 euros on an even outcome, lose 10 euros on an odd outcome:
f(ω)={10−10ω∈{2,4,6}ω∈{1,3,5}.
2.11 Probability mass function (PMF), probability density function (PDF), cumulative distribution function (CDF)
Simple (discrete) density functionφ:R→[0,1], when f takes finitely many distinct values y1,…,yn:
φ(yi)=P(f=yi),φ(x)=0 otherwise.
This is Marinacci's simple/finite density function (lect6 p.7); in standard terminology it is the probability mass function (PMF). One always has ∑i=1nφ(yi)=1 and Φ(x)=∑yi≤xφ(yi).
Integrable (continuous) density functionφ:R→[0,+∞):
Φ(x)=∫−∞xφ(t)dt,∫−∞+∞φ(t)dt=1.
This is the PDF. For continuous Φ with a continuous density φ, Φ′(x)=φ(x) (by Barrow–Torricelli; this is Marinacci's prop 2042).
Carrier[a,b] of Φ: an interval such that Φ(x)=0 for all x≤a and Φ(x)=1 for all x≥b. Equivalently, φ(x)=0 outside [a,b]. A random variable that admits a carrier is called essentially bounded.
2.12 Expected value EP(f)
Discrete (simple) case (Sicconi lect6 p.8):
EP(f)=∑ω∈suppPf(ω)⋅P({ω})=∑i=1nyi⋅φ(yi),
where the last sum is over the distinct values yi∈Imf.
Stieltjes-integral formulation (Marinacci prop 2043): if [a,b] is a carrier for Φ,
EP(f)=∫abxdΦ(x).
When Φ has a continuous density φ, this simplifies to EP(f)=∫abxφ(x)dx (via dΦ=Φ′(x)dx=φ(x)dx).
2.13 Variance, standard deviation, covariance, correlation
Let f,g be random variables on (Ω,P) with finite expected values.
Variance:
VP(f)=EP[(f−EP(f))2]=∑ω∈suppP(f(ω)−EP(f))2⋅P({ω})≥0.Computation formula (Theorem 2024): VP(f)=EP(f2)−[EP(f)]2.
Standard deviation: σP(f)=VP(f)≥0.
Covariance:
CovP(f,g)=EP[(f−EP(f))(g−EP(g))].Computation formula (Theorem 2026): CovP(f,g)=EP(fg)−EP(f)EP(g). Note Cov(f,f)=V(f).
Linear correlation coefficient (defined when σ(f),σ(g)>0):
ρP(f,g)=σP(f)⋅σP(g)CovP(f,g).
The Cauchy–Schwarz bound (Theorem 2028, §3.13 below) gives ∣Cov(f,g)∣≤σ(f)σ(g), i.e., ∣ρ(f,g)∣≤1.
2.14 A set function that is not a measure (lect1 Q1)
Let Ω={T1,…,T50} be 50 stocks, and ϕ(Ti) the difference between opening and closing price of stock Ti. Define
μ(A)=∑Ti∈Aϕ(Ti)∀A⊆Ω.
Then μ is a set function (assigns a real number to every event) and satisfies μ(∅)=0 and additivity — but μ(A) can be negative (some stocks fell), so axiom K1 (positivity) fails. μ is a set function but not a measure and hence not a probability. This distinguishes the three levels of the hierarchy.
§3. Theorems, Propositions & Proofs
Each entry lists the theorem name, its Marinacci number (where known), the source, a clean statement, and a full proof (verbatim from Probability Proofs.pdf where available).
3.1 Monotonicity property of a measure (page 1987)
Statement. Let M:2Ω→[0,+∞) be a measure. Then M is monotone:
M(A)≤M(B)∀A,B⊆Ω such that A⊆B.
Source: Probability Proofs.pdf p.1987 (handwritten).
Proof.
Consider A,B⊆Ω with A⊆B, and define C=B∖A. Then
A∪C=B,A∩C=∅.That is, A and the "leftover" C partition B. Since every measure is finitely additive and positive,
M(B)=M(A∪C)=M(A)+M(C)≥M(A),
using M(C)≥0. ■
3.2 Relation between M(A∪B) and M(A∩B) (page 1989)
Statement. For every measure M:2Ω→[0,+∞) and all A,B⊆Ω,
M(A∪B)+M(A∩B)=M(A)+M(B).
Source: Probability Proofs.pdf p.1989.
Proof.
Split A and B using their intersection and relative complements:
A=(A∖B)∪(A∩B),(A∖B)∩(A∩B)=∅,B=(B∖A)∪(A∩B),(B∖A)∩(A∩B)=∅.
Since M is (finitely) additive,
M(A)=M(A∖B)+M(A∩B),M(B)=M(B∖A)+M(A∩B).
Moreover A∪B=(A∖B)∪(A∩B)∪(B∖A) with the three pieces pairwise disjoint, so by finite additivity
M(A∪B)=M(A∖B)+M(A∩B)+M(B∖A).
Adding M(A∩B) to both sides:
M(A∪B)+M(A∩B)=M(A∖B)+M(A∩B)+M(B∖A)+M(A∩B)=M(A)+M(B).■
Corollary (inclusion–exclusion for a probability). For P:2Ω→[0,1] and all A,B⊆Ω,
P(A∪B)=P(A)+P(B)−P(A∩B).
3.3 Probability of the complement (page 1883)
Statement. Let P:2Ω→[0,1] be a probability. Then for every A⊆Ω,
P(Ac)=1−P(A).
Source: Probability Proofs.pdf p.1883.
Proof.
Since A∩Ac=∅, by the additivity property (K3),
P(A∪Ac)=P(A)+P(Ac).
But A∪Ac=Ω and P(Ω)=1 (K2, normalisation), so
1=P(A)+P(Ac)⟹P(Ac)=1−P(A).■
Consequence.P(∅)=P(Ωc)=1−P(Ω)=1−1=0.
Union bound (Boole's inequality). For any A,B, P(A∪B)≤P(A)+P(B) (drop the non-negative P(A∩B) from §3.2). More generally P(⋃iAi)≤∑iP(Ai).
3.4 Property of a simple probability #1 (page 1999)
Statement. Let P:2Ω→[0,1] be a simple probability. If E⊆Ω is a finite event with P(E)=1, then for every ω∈/E we have P({ω})=0.
Source: Probability Proofs.pdf p.1999.
Proof.
Let ω∈/E; then ω∈Ec. By §3.3, P(Ec)=1−P(E)=1−1=0. By monotonicity (§3.1) applied to {ω}⊆Ec,
0≤P({ω})≤P(Ec)=0,
forcing P({ω})=0. ■
3.5 Property of a simple probability #2 (page 2000)
Statement. Let P:2Ω→[0,1] be a simple probability. Then:
(1) suppP is a finite event with P(suppP)=1.
(2) For every A⊆Ω that is a finite event with P(A)=1, suppP⊆A.
Source: Probability Proofs.pdf p.2000.
Proof of (1).
Since P is simple, there exists a finite event E with P(E)=1. By §3.4, P({ω})=0 for every ω∈/E. Hence every state with P({ω})>0 lies in E, i.e., suppP⊆E. Since E is finite, so is suppP.
Consider the disjoint decomposition E=suppP∪(E∖suppP). By additivity,
P(E)=P(suppP)+P(E∖suppP).
For every ω∈E∖suppP, ω∈/suppP so P({ω})=0; by finite additivity over the finite set E∖suppP, P(E∖suppP)=0. Hence
P(suppP)=P(E)=1.
Proof of (2) — by contradiction.
Suppose there exists ω∈suppP with ω∈/A. By definition of the support, P({ω})>0. Set B=A∪{ω}; then A∩{ω}=∅, so by additivity
P(B)=P(A)+P({ω})=1+P({ω})>1.
This contradicts P(B)≤1. Hence suppP⊆A. ■
3.6 Property of a simple probability #3 (page 2001)
Statement. Let P:2Ω→[0,1] be a simple probability. Then for every A⊆Ω,
P(A)=∑ω∈A∩suppPP({ω}).
Source: Probability Proofs.pdf p.2001.
Proof.
For any A⊆Ω, decompose
A=(A∩suppP)∪(A∩(suppP)c),(A∩suppP)∩(A∩(suppP)c)=∅.
By additivity,
P(A)=P(A∩suppP)+P(A∩(suppP)c).
Since A∩(suppP)c⊆(suppP)c and P((suppP)c)=1−1=0, by monotonicity P(A∩(suppP)c)=0. Hence
P(A)=P(A∩suppP)=P(⋃ω∈A∩suppP{ω})=∑ω∈A∩suppPP({ω}),
using finite additivity on the finite union. ■
3.7 Theorem 2002 — Simple probabilities are convex linear combinations of Dirac probabilities
Statement. Let P:2Ω→[0,1] be a simple probability. Then for every A⊆Ω,
P(A)=∑ω∈suppPP({ω})⋅δω(A).
Source: Probability Proofs.pdf p.2002.
Proof.
By §3.6, P(A)=∑ω∈A∩suppPP({ω}). Rewrite the indicator via the Dirac probability:
∑ω∈A∩suppPP({ω})=∑ω∈suppPP({ω})⋅1[ω∈A]=∑ω∈suppPP({ω})⋅δω(A).The indicator 1[ω∈A] is exactly δω(A): it equals 1 if ω∈A and 0 otherwise.■
Why this matters. The theorem says every simple probability P is a convex combination of point-masses: the weights P({ω}) sum to 1 and are non-negative, and the "atoms" δω are probabilities. This is the structural fact behind proposition 2006 below (to prove P is countably additive, it suffices to prove each δω is).
3.8 Proposition 2006 — Every Dirac probability is countably additive ★ May 2024 Q10c ★
Statement. Let ω0∈Ω and δω0:2Ω→[0,1] be the Dirac probability concentrated at ω0. Then δω0 is countably additive.
Source: Probability Proofs.pdf p.2006 (proposition 2006, "first part of the proof" as cited in May 2024 Q10).
Proof — by the monotone-sequence characterisation of countable additivity. We use the equivalence stated in §2.7: P is countably additive iff for every decreasing sequence An↓A (i.e., A1⊇A2⊇⋯ and A=⋂n=1+∞An), limn→∞P(An)=P(A).
Consider an arbitrary decreasing collection {An}n=1+∞ of events with An↓A=⋂n=1+∞An. We split into two cases on whether ω0 belongs to the limit set A.
Case (I): ω0∈A=⋂n=1+∞An.
Then ω0∈An for every n≥1, hence δω0(An)=1 for every n. Also δω0(A)=1 (since ω0∈A). Therefore
limn→+∞δω0(An)=limn→+∞1=1=δω0(A).✓
Case (II): ω0∈/A=⋂n=1+∞An.
Since ω0 is not in the intersection, ω0∈/Anˉ for some nˉ∈N. Because the sequence is decreasing (Anˉ⊇Anˉ+1⊇⋯), for every n≥nˉ, An⊆Anˉ, so ω0∈/An. Hence δω0(An)=0 for all n≥nˉ. Also δω0(A)=0 (since ω0∈/A). Therefore
limn→+∞δω0(An)=limn→+∞0=0=δω0(A).✓
Since limn→∞δω0(An)=δω0(A) in both cases, by the monotone-sequence criterion δω0 is countably additive. ■
Remark (extension to simple probabilities — "part 2" of the proof). Let P be a simple probability. By Theorem 2002 (§3.7), P(A)=∑ω∈suppPP({ω})⋅δω(A). For any decreasing collection An↓A,
limn→∞P(An)=limn→∞∑ω∈suppPP({ω})⋅δω(An)=∑ω∈suppPP({ω})⋅limn→∞δω(An)=∑ω∈suppPP({ω})⋅δω(A)=P(A),
where the swap of lim and ∑ is legal because suppP is finite. Hence every simple probability is countably additive. (May 2024 Q10(c) accepts the shorter "first part" — just the Dirac case.)
3.9 Proposition 2009 — The uniform probability on N is not countably additive
Statement. There is no countably additive probability P on Ω=N for which every singleton has the same probability (a "uniform probability on N").
Source: Probability Proofs.pdf p.2009.
Proof — by contradiction.
Suppose such a uniform countably additive P exists, and set P({n})=k for every n∈N, with k≥0. Since N=⋃n∈N{n} is a disjoint union and P is countably additive,
1=P(N)=P(⋃n∈N{n})=∑n=1+∞P({n})=∑n=1+∞k={0+∞k=0k>0.
In both cases we reach a contradiction (1=0 or 1=+∞). Therefore no countably additive uniform probability on N exists. ■
Consequence. Simple probability ↔ countable additivity is not equivalent. Poisson and Geometric distributions (defined on Ω=N) are not simple but are countably additive (lect2).
3.10 Page 2014 — Random variables equal P-a.e.: characterization
Two random variables f,g:Ω→R are equal P-almost everywhere (equal P-a.e.) if P({ω∈Ω:f(ω)=g(ω)})=1.
Statement. Let P be a simple probability. Then f,g are equal P-a.e. if and only if f(ω)=g(ω) for every ω∈suppP.
Source: Probability Proofs.pdf p.2014.
Proof.(⇐) Suppose f(ω)=g(ω) for every ω∈suppP. Then suppP⊆{ω:f(ω)=g(ω)}. Since P(suppP)=1, by monotonicity (§3.1) P({ω:f(ω)=g(ω)})=1.
(⇒) Suppose P({ω:f(ω)=g(ω)})=1. By §3.5(2) applied with A={ω:f(ω)=g(ω)}, suppP⊆A, i.e., f(ω)=g(ω) for every ω∈suppP. ■
3.11 Page 2017 — Random variables equal P-a.e. have the same expected value
Statement. Let P be a simple probability and f,g:Ω→R random variables equal P-a.e. Then EP(f)=EP(g).
Source: Probability Proofs.pdf p.2017.
Proof.
By definition EP(f)=∑ω∈suppPf(ω)⋅P({ω}) and similarly for g. By §3.10, f(ω)=g(ω) for every ω∈suppP, so the two sums coincide termwise:
EP(f)=∑ω∈suppPf(ω)⋅P({ω})=∑ω∈suppPg(ω)⋅P({ω})=EP(g).■
Caveat (lect6 p.9). The converse is false: EP(f)=EP(g) does not imply f=gP-a.e. Example: on a fair die, f(ω)=+10 for even, −10 for odd; g(ω)=−10 for even, +10 for odd. Both have expected value 0 but they disagree on every outcome.
3.12 Page 2018 — Expected value properties (linearity, monotonicity, extension)
Statement. Let P be a simple probability and f,g:Ω→R random variables.
(1) Linearity. For all α,β∈R, EP(αf+βg)=αEP(f)+βEP(g).
(2) Monotonicity. If f(ω)≥g(ω) for every ω∈Ω, then EP(f)≥EP(g).
(3) Extension to finite sets. For every finite A⊆Ω with A⊇suppP,
EP(f)=∑ω∈Af(ω)⋅P({ω}).
Source: Probability Proofs.pdf p.2018.
Proof of (1) — Linearity.EP(αf+βg)=∑ω∈suppP(αf(ω)+βg(ω))⋅P({ω})=α∑ωf(ω)P({ω})+β∑ωg(ω)P({ω})=αEP(f)+βEP(g).
Proof of (2) — Monotonicity.
Assume f(ω)≥g(ω) for every ω. Multiplying by P({ω})≥0 preserves the inequality: f(ω)P({ω})≥g(ω)P({ω}). Summing over ω∈suppP gives EP(f)≥EP(g).
Proof of (3) — Extension.
States ω∈Ω with ω∈/suppP have P({ω})=0, so their contribution to the sum is zero:
∑ω∈Af(ω)P({ω})=∑ω∈suppPf(ω)P({ω})+=0ω∈A∖suppP∑f(ω)⋅0=EP(f).■
Corollary — Expected value of an affine function.EP(αf+β)=αEP(f)+β for all α,β∈R. Proof: apply (1) with g≡1 (note EP(1)=∑ωP({ω})=1).
Statement. Let P be a simple probability and f:Ω→R a random variable. For all α,β∈R:
(1) Computation formula.VP(f)=EP(f2)−[EP(f)]2.
(2) Affine functions.VP(αf+β)=α2VP(f).
Source: Probability Proofs.pdf p.2024.
Proof of (1).
Expand (f−EP(f))2=f2−2fEP(f)+[EP(f)]2. By linearity of expectation (§3.12.1),
VP(f)=EP[(f−EP(f))2]=EP(f2)−2EP(f)⋅EP(f)+[EP(f)]2=EP(f2)−[EP(f)]2.(We used EP(EP(f))=EP(f) because EP(f)∈R is a constant.)
Proof of (2).
By definition and linearity (§3.12),
VP(αf+β)=EP[(αf+β−EP(αf+β))2]=EP[(αf+β−αEP(f)−β)2]=EP[α2(f−EP(f))2]=α2VP(f).■
Observation. The constant β disappears because shifting a random variable by a constant does not change its spread; the coefficient α is squared because variance has units of f2.
Statement. Let P be a simple probability and f,g:Ω→R random variables. For all α,β,γ,δ∈R:
(1) Computation formula.CovP(f,g)=EP(fg)−EP(f)EP(g).
(2) Bilinearity for affine functions.CovP(αf+β,γg+δ)=αγCovP(f,g).
Source: Probability Proofs.pdf p.2026.
Proof of (1).
Expand
(f−EP(f))(g−EP(g))=fg−fEP(g)−gEP(f)+EP(f)EP(g).
Apply expectation and use linearity (§3.12.1):
CovP(f,g)=EP(fg)−EP(f)EP(g)−EP(g)EP(f)+EP(f)EP(g)=EP(fg)−EP(f)EP(g).
Proof of (2).
First note that for every ω,
(αf(ω)+β−EP(αf+β))(γg(ω)+δ−EP(γg+δ))=(αf(ω)−αEP(f))(γg(ω)−γEP(g))=αγ(f(ω)−EP(f))(g(ω)−EP(g)).The additive constants β,δ cancel because EP(αf+β)=αEP(f)+β. Taking expectation and factoring αγ out by linearity,
CovP(αf+β,γg+δ)=αγEP[(f−EP(f))(g−EP(g))]=αγCovP(f,g).■
Special cases worth memorising.
Cov(f,f)=V(f) (set g=f, α=γ=1, β=δ=0).
Cov(αf,βg)=αβCov(f,g) (set β=δ=0 — this is the May 2024 MCQ5 identity).
3.15 Page 2027 — Variance of a sum of random variables
Statement. Let P be a simple probability and f,g:Ω→R random variables. Then
VP(f+g)=VP(f)+VP(g)+2CovP(f,g).
Source: Probability Proofs.pdf p.2027.
Proof.
By definition,
VP(f+g)=EP[(f+g−EP(f+g))2]=EP[(f−EP(f)+g−EP(g))2],
using linearity of expectation EP(f+g)=EP(f)+EP(g). Expanding the square:
(f−EP(f)+g−EP(g))2=(f−EP(f))2+(g−EP(g))2+2(f−EP(f))(g−EP(g)).
Apply expectation and linearity:
VP(f+g)=VP(f)+VP(g)+2CovP(f,g).■
Corollary (independent f,g). If f,g are independent, Cov(f,g)=0, so V(f+g)=V(f)+V(g).
General formula for an affine combination (lect7): V(αf+βg+const)=α2V(f)+β2V(g)+2αβCov(f,g).
Statement. Let P be a simple probability and f,g:Ω→R random variables. Then
∣CovP(f,g)∣≤σP(f)⋅σP(g).
Source: Probability Proofs.pdf p.2028.
Proof.
Write suppP={ω1,…,ωn}.
Part (I) — centred case EP(f)=EP(g)=0.
Define xi=f(ωi)P({ωi}) and yi=g(ωi)P({ωi}) for i=1,…,n. These are just real numbers (elements of Rn).
∣CovP(f,g)∣=∣∑i=1nf(ωi)g(ωi)P({ωi})∣=∣∑i=1nxiyi∣=∣x⋅y∣.(We used the computation formula Cov(f,g)=E(fg)−E(f)E(g)=E(fg) since the centering kills the product.) By the Cauchy–Schwarz inequality in Rn,
∣x⋅y∣≤∥x∥⋅∥y∥,
and
∥x∥=∑i=1nxi2=∑i=1nf2(ωi)P({ωi})=EP(f2)=VP(f)=σP(f),
similarly ∥y∥=σP(g). Thus ∣CovP(f,g)∣≤σP(f)σP(g).
Part (II) — general case. Define the centred variables f~=f−EP(f) and g~=g−EP(g). Then EP(f~)=EP(g~)=0, so Part (I) applies:
∣CovP(f~,g~)∣≤σP(f~)⋅σP(g~).
By the affine-function rules for variance and covariance (§3.13.2, §3.14.2):
σP(f~)=σP(f−EP(f))=σP(f),σP(g~)=σP(g),CovP(f~,g~)=CovP(f,g).
Therefore ∣CovP(f,g)∣≤σP(f)σP(g). ■
Consequence.∣ρP(f,g)∣=∣CovP(f,g)∣/(σP(f)σP(g))≤1, with equality iff g is an affine function of f (or vice versa).
3.17 Distribution function is increasing (page 2032)
Statement. Let f:Ω→R be a random variable and Φ(x)=P(f≤x) its distribution function. Then Φ:R→[0,1] is (weakly) increasing.
Source: Probability Proofs.pdf p.2032.
Proof.
For x≤y, {f≤x}⊆{f≤y}. By monotonicity of P (§3.1 applied to the probability measure),
Φ(x)=P(f≤x)≤P(f≤y)=Φ(y).■
3.18 Distribution function for an essentially bounded random variable is eventually constant (page 2037)
Statement. If f:Ω→R is essentially bounded (i.e., there exist m,M∈R with P(m≤f≤M)=1), then there exist scalars a,b such that Φ(x)=0 for x≤a and Φ(x)=1 for x≥b.
Source: Probability Proofs.pdf p.2037.
Proof.
Pick m,M∈R with P(m≤f≤M)=1.
For x<m: {f≤x}∩{m≤f≤M}=∅, so by disjointness {f≤x}⊆{m≤f≤M}c, a set of probability 0. By monotonicity, Φ(x)=P(f≤x)=0.
For x≥M: {m≤f≤M}⊆{f≤x}, so by monotonicity Φ(x)=P(f≤x)≥P(m≤f≤M)=1. Combined with Φ(x)≤1, we get Φ(x)=1.
Setting a<m and b=M yields the conclusion. ■
Terminology. Any interval [a,b] such that Φ(x)=0 for x≤a and Φ(x)=1 for x≥b is called a carrier of the distribution function.
3.19 Properties of the distribution function for a continuous density (pages 2041, 2042)
Statement (2041 — density vanishes outside the carrier). Let Φ(x)=∫−∞xφ(t)dt be a distribution function with integrable density φ and carrier [a,b]. If φ is continuous outside [a,b], then φ(x)=0 for every x∈/[a,b].
Proof.
Since [a,b] is a carrier, ∫−∞+∞φ(x)dx=∫abφ(x)dx=1. Fix z1,z2>b with z1<z2; by the definition of the carrier, ∫z1z2φ(x)dx=0. Since φ is continuous (and non-negative) on [z1,z2], the vanishing integral forces φ(x)=0 on [z1,z2]. This holds for all such z1,z2, hence φ(x)=0 on (b,+∞). The symmetric argument on (−∞,a) completes the proof. ■
Statement (2042 — Barrow–Torricelli link). Let Φ be a distribution function with carrier [a,b]. Then Φ has a unique continuous density φ on [a,b] iff Φ is continuously differentiable on [a,b], in which case Φ′(x)=φ(x).
Proof. Apply the Barrow–Torricelli theorem (prop 2030 from Integral Calculus) on [a,b] with Φ=g and φ=γ. ■
(These technical lemmas rarely appear on their own on the exam, but they underlie the continuous distributions listed in §3.24.)
3.20 Expected value of a random variable w.r.t. a simple probability as a Stieltjes integral (page 2043)
Statement. Let P be a simple probability and f:Ω→R a random variable with distribution function Φ having carrier [a,b]. Then
EP(f)=∫abxdΦ(x).
Source: Probability Proofs.pdf p.2043.
Proof.
Write suppP={ω1,…,ωn} and set xi=f(ωi). Assume (WLOG) the xi are distinct and ordered x1<x2<⋯<xn. Since [a,b] is a carrier, a<x1 and b≥xn.
EP(f)=∑i=1nf(ωi)P({ωi})=∑i=1nxiP(f=xi).
By countable additivity of P, the jump size of Φ at xi is Φ(xi)−limx→xi−Φ(x)=P(f=xi). Hence
EP(f)=∑i=1nxi[Φ(xi)−limx→xi−Φ(x)].
Since Φ is increasing and right-continuous on [a,b] (§3.17 and standard CDF properties), by the theorem on the writing of the Stieltjes integral with step-function integrator,
EP(f)=∫abxdΦ(x).■
3.21 Bayes' theorem and the law of total probability
Bayes' theorem (stated in lect3). Let A,B⊆Ω with P(A),P(B)>0. Then
P(A∣B)=P(B)P(B∣A)⋅P(A).
Proof. By the definition of conditional probability,
P(A∣B)=P(B)P(A∩B)andP(B∣A)=P(A)P(A∩B).
Multiply the second equation by P(A) to get P(A∩B)=P(B∣A)P(A), and substitute into the first. ■
Law of total probability. Let {Ei}i=1n be a partition of Ω (pairwise disjoint, union equal to Ω) with P(Ei)>0 for every i. Then for every A⊆Ω,
P(A)=∑i=1nP(A∣Ei)⋅P(Ei).
Proof.A=A∩Ω=A∩⋃i=1nEi=⋃i=1n(A∩Ei), a disjoint union. By finite additivity,
P(A)=∑i=1nP(A∩Ei)=∑i=1nP(A∣Ei)P(Ei).■
Combined Bayes / total-probability formula. If {Ei} is a partition and P(A)>0,
P(Ek∣A)=∑i=1nP(A∣Ei)P(Ei)P(A∣Ek)P(Ek).
3.22 Markov's and Chebyshev's inequalities
Markov's inequality (lect5). For any random variable X≥0 and any a>0,
P(X≥a)≤aE(X).
Proof. Since X≥0, X≥a⋅1{X≥a} pointwise (if X(ω)≥a, then X(ω)≥a⋅1; if X(ω)<a, the RHS is 0). Taking expectations (monotonicity, §3.12.2):
E(X)≥a⋅E(1{X≥a})=a⋅P(X≥a).■
Chebyshev's inequality. Let X be a random variable with finite mean μ=E(X) and variance σ2=V(X). For every k>0,
P(∣X−μ∣≥kσ)≤k21.
Equivalently, P(∣X−μ∣≥a)≤σ2/a2 for every a>0.
Proof. Apply Markov's inequality to Y=(X−μ)2≥0 with threshold a=(kσ)2:
P(∣X−μ∣≥kσ)=P(Y≥(kσ)2)≤(kσ)2E(Y)=k2σ2V(X)=k21.■
Interpretation. Chebyshev says at least 1−1/k2 of the probability mass is within k standard deviations of the mean. E.g., k=2 gives P(∣X−μ∣<2σ)≥3/4.
3.23 Named discrete distributions (lect2, lect6)
For each distribution we list Ω, the PMF pn, the expected value E(f) (where f is the identity random variable, f=n), and the variance V(f).
Proof that E(f)=1/α for exponential (lect7, reproduced from the source).
EP(f)=∫−∞+∞xdΦ(x)=∫0+∞x⋅αe−αxdx.
Integration by parts with u=αx, u′=α, v′=e−αx, v=−α1e−αx:
∫αx⋅e−αxdx=−xe−αx+∫e−αxdx=−xe−αx−α1e−αx+C=−e−αx(x+α1)+C.
Evaluating from 0 to +∞ and using the hierarchy of infinities limx→+∞x/eαx=0:
EP(f)=limK→+∞[−e−αK(K+α1)+e0⋅α1]=0+α1=α1.■
Proof that E(f)=0 for standard normal. The PDF φ(x)=2π1e−x2/2 is symmetric about 0, so E(f)=∫−∞+∞xφ(x)dx=0 (integrand is odd and absolutely integrable). Alternatively, a direct computation:
EP(f)=2π1∫−∞+∞xe−x2/2dx=2π1[limK→+∞(−e−x2/2)−K0+limK→+∞(−e−x2/2)0K]=2π1[−1+1]=0.■
§4. Worked Examples
Example 4.1 — Poisson P(n≥3) for λ=2 ★ May 2024 MCQ4 ★
Problem. Consider Ω=N and the Poisson probability P with parameter λ=2. Compute P(E) for E={n∈N:n≥3}.
Solution. The Poisson PMF with λ=2 is pn=e−2⋅2n/n!. Rather than summing an infinite tail, use the complement trick:
P(n≥3)=1−P(n≤2)=1−∑n=02e−2n!2n=1−e−2(0!20+1!21+2!22)=1−e−2(1+2+2)=1−5e−2.
Common trap. Students sometimes compute P(n>3) and use P(n≤3); since the problem asks for n≥3 (inclusive), one must use P(n≤2) in the complement. A second trap is forgetting 0!=1.
Source: General_24524_ENG_SOL.pdf Mode A MCQ4.
Example 4.2 — Back-solve σP(f) from covariance and correlation ★ May 2024 MCQ5 ★
Problem. Let f,g be two random variables on (Ω,P) with VP(g)=100, ρP(f,g)=−0.1, and ∣CovP(2f,5g)∣=90. Compute σP(f).
Solution. First, by covariance bilinearity (§3.14.2):
CovP(2f,5g)=2⋅5⋅CovP(f,g)=10CovP(f,g),
so ∣CovP(f,g)∣=∣10Cov∣/10=90/10=9.
Second, σP(g)=VP(g)=100=10.
Third, recall the definition of the correlation coefficient:
ρP(f,g)=σP(f)σP(g)CovP(f,g)=−0.1.
Since ρ<0, the sign of Cov(f,g) must be negative: CovP(f,g)=−9. Substituting:
−0.1=σP(f)⋅10−9⟹σP(f)=−0.1⋅10−9=9.
Common trap. Forgetting to square α inside V (e.g., computing V(2f)=2V(f) instead of 4V(f)) — that is the variance rule, not the covariance rule. Another trap: forgetting to extract the sign from ρ and ending up with σ(f)=±9.
Source: General_24524_ENG_SOL.pdf Mode A MCQ5.
Example 4.3 — Bayes' theorem (classic disease test)
Problem. A diagnostic test is 99% sensitive (true-positive rate) and 99% specific (true-negative rate). The disease affects 1 in 1000 people. A randomly chosen person tests positive. What is the probability that they actually have the disease?
Solution. Let D = "has disease", T+ = "tests positive". Given: P(D)=0.001, P(T+∣D)=0.99, P(T+∣Dc)=1−0.99=0.01. Compute P(D∣T+) using Bayes:
P(D∣T+)=P(T+)P(T+∣D)P(D)=0.99⋅0.001+0.01⋅0.9990.99⋅0.001=0.00099+0.009990.00099=0.010980.00099≈0.0902.
Observation. Despite the test being "99% accurate", the posterior probability of having the disease given a positive test is only ∼9% — because the disease is rare, false positives dominate true positives. This is the classic base-rate fallacy.
Example 4.4 — E and V of a custom discrete random variable
Problem. Roll a (fair) die. Define f(ω)=+10 if ω is even, −10 if ω is odd. Compute E(f), V(f), σ(f).
Extension. Compute VP(6−3f) and σP(6−3f) using the affine-function rule:
V(6−3f)=(−3)2V(f)=9⋅100=900,σ(6−3f)=∣−3∣⋅σ(f)=3⋅10=30.
Source: lect6 p.8.
Example 4.5 — Dirac additivity proof (the May 2024 Q10 "template")
Problem. Verify directly (no monotone-sequence trick) that δω0(E1∪E2)=δω0(E1)+δω0(E2) for any disjoint events E1,E2 — i.e., check that δω0 satisfies Kolmogorov's axiom K3.
Solution — by cases.
Case 1: ω0∈/E1∪E2. Then ω0∈/E1 and ω0∈/E2, so δω0(E1)=δω0(E2)=0 and δω0(E1∪E2)=0=0+0. ✓
Case 2: ω0∈E1∪E2. Since E1∩E2=∅, ω0 belongs to exactly one of E1,E2 — say ω0∈E1 (the other case is symmetric). Then δω0(E1)=1, δω0(E2)=0, δω0(E1∪E2)=1=1+0. ✓
In both cases K3 holds. The K1 and K2 axioms (positivity and normalisation) are immediate from the definition: δω0(E)∈{0,1}⊆[0,1] for every E, and δω0(Ω)=1 since ω0∈Ω.
Extension to countable additivity. This is exactly the two-case structure of Proposition 2006 (§3.8), adapted to a decreasing sequence. Memorise the argument — on the exam, the two-case skeleton "Case I: ω0∈A / Case II: ω0∈/A" is the entire proof.
Example 4.6 — Covariance from a joint PMF table
Problem (lect7 p.4, modified). Roll a fair die. Define
f(ω)={−1+1ω∈{1,2,3}ω∈{4,5,6},g(ω)=⎩⎨⎧−20+2ω∈{1,2}ω∈{3,4}ω∈{5,6}.
Compute Cov(f,g) and ρ(f,g).
Solution.E(f)=(−1)⋅21+1⋅21=0.E(g)=(−2)⋅31+0⋅31+2⋅31=0.E(fg): evaluate fg at each of the 6 outcomes:
V(f)=E(f2)−0=1,V(g)=(−2)2/3+0+(+2)2/3=8/3. So σ(f)=1,σ(g)=8/3=22/3.
ρ(f,g)=1⋅22/34/3=22/34/3=32/32=62=36≈0.816.
A strong positive linear relationship, as expected (both f and g increase with ω).
Example 4.7 — Independence check
Problem. Roll a fair die. Let A={even}={2,4,6} and B={≤3}={1,2,3}. Are A and B independent?
Solution.P(A)=1/2, P(B)=1/2, P(A∩B)=P({2})=1/6. Check: P(A)P(B)=1/4=1/6=P(A∩B). NOT independent.
Verify via conditional probability: P(A∣B)=P(A∩B)/P(B)=(1/6)/(1/2)=1/3=1/2=P(A) — knowing the outcome is ≤3 decreases the chance of evenness from 50% to 33%.
Contrast. Let A={even} and B′={1,2}. P(A∩B′)=P({2})=1/6; P(A)P(B′)=(1/2)(1/3)=1/6. Independent.
§5. Solution Methods
Each method is a named algorithm (input → steps → output → pitfalls). Cross-references in the right column point to the May 2024 exam problems where the method is applied.
M-P-1 — Compute a probability via Kolmogorov rules
Used on: general MCQs, TA exercises.
Input. A probability P and an event A (possibly built from unions, intersections, complements, or conditional formulas).
Steps.
Rewrite A in a canonical form using set algebra (complements, finite unions/intersections).
Apply the Kolmogorov tools one at a time:
Complement: P(Ac)=1−P(A).
Inclusion–exclusion: P(A∪B)=P(A)+P(B)−P(A∩B).
Finite additivity (disjoint): P(A1∪⋯∪An)=∑P(Ai) if pairwise disjoint.
Monotonicity: if A⊆B, P(A)≤P(B).
Union bound: P(⋃Ai)≤∑P(Ai) (always, no disjointness needed).
Simplify.
Output. A numerical value.
Pitfalls.
Forgetting to subtract P(A∩B) when applying inclusion–exclusion to non-disjoint events.
Using finite additivity on non-disjoint events (wrong answer).
Neglecting the normalization P(Ω)=1 when computing P(Ac) (forgetting the "1").
M-P-2 — Apply Bayes' theorem
Used on: disease-test, gambler's problems.
Input. Prior probabilities P(Ei) for a partition {Ei}, and likelihoods P(A∣Ei) for an observed event A.
Steps.
Identify the partition {Ei} and compute P(A)=∑iP(A∣Ei)P(Ei) via the law of total probability.
Apply Bayes: P(Ek∣A)=P(A∣Ek)P(Ek)/P(A).
Output. The posterior probability P(Ek∣A).
Pitfalls.
Confusing P(E∣A) with P(A∣E) (the "confusion of the inverse").
Forgetting the prior factor P(E) — students sometimes conclude P(E∣A)=P(A∣E).
M-P-3 — P(X≥k) for discrete: use 1−P(X≤k−1)
Used on: May 2024 MCQ4 (Poisson tail).
Input. A discrete random variable X and a threshold k.
Steps.
Rewrite {X≥k}={X≤k−1}c.
Compute P(X≤k−1)=∑i=0k−1P(X=i) — a finite sum.
Subtract from 1: P(X≥k)=1−P(X≤k−1).
Output. A numerical expression.
Pitfalls.
Using the strict-inequality complement {X>k}={X≤k}c instead of {X≥k}={X≤k−1}c. For a continuous random variable the distinction doesn't matter (since P(X=k)=0); for a discrete one, it does.
Forgetting 0!=1 in Poisson or mis-expanding λ0=1.
M-P-4 — Back-solve for σ/V using Cov/correlation relationships
Used on: May 2024 MCQ5 (the canonical "given ρ and ∣Cov(αf,βg)∣, find σ(f)").
Input. Given values of V(g), ρ(f,g), and ∣Cov(αf,βg)∣ (or similar). Find an unknown σ(f) or V(f).
Steps.
Unwrap the covariance using bilinearity: Cov(αf,βg)=αβCov(f,g).
Extract ∣Cov(f,g)∣ from the given ∣Cov(αf,βg)∣: divide by ∣αβ∣.
Use the given ρ(f,g) to recover the sign: Cov(f,g) has the same sign as ρ.
Plug into ρ=Cov/(σ(f)σ(g)) and solve for σ(f) (using σ(g)=V(g)).
Output.σ(f).
Pitfalls.
Confusing the variance rule (V(αf)=α2V(f)) with the covariance rule (Cov(αf,βg)=αβCov(f,g)). Covariance is bilinear, not quadratic.
Forgetting the sign of the correlation when taking ∣Cov∣; exam problems often give ∣⋅∣ expressly to test this.
Plugging in V(g) where σ(g) is needed (must take square root).
M-P-5 — Compute E(X), V(X) from a PMF or PDF
Used on: worked examples throughout (§4.4, §4.6, §4.7).
Input. The PMF φ(yi)=P(X=yi) or PDF φ(x), and the random variable f (often f=identity).
Steps (discrete).
E(f)=∑iyiφ(yi).
E(f2)=∑iyi2φ(yi).
V(f)=E(f2)−[E(f)]2 (computation formula — easier than the definition).
σ(f)=V(f).
Steps (continuous).
E(f)=∫abxφ(x)dx (restricted to the carrier [a,b] if one exists).
E(f2)=∫abx2φ(x)dx.
V(f)=E(f2)−[E(f)]2.
Output.E(f), V(f), σ(f).
Pitfalls.
For discrete: summing over yi instead of ωi. Recall φ(yi)=P(f=yi)=∑ω:f(ω)=yiP({ω}).
For continuous: using the wrong bounds; recall φ(x)=0 outside the carrier, so ∫−∞+∞=∫ab whenever a carrier exists.
Using V(f)=∑iyi2φ(yi) (forgetting the subtraction of [E(f)]2).
M-P-6 — Verify countable additivity for a candidate P
Used on: May 2024 Q10 (verify Dirac is CA).
Input. A candidate probability P and the hypothesis "is P countably additive?".
Steps.
Consider an arbitrary decreasing sequence An↓A (i.e., A1⊇A2⊇⋯ and A=⋂An).
Compute limn→∞P(An).
Compare to P(A).
If they are equal for every such sequence, P is countably additive. Otherwise, not.
Alternative (for "simple" probabilities). Invoke Theorem 2002: every simple probability is a convex combination of Diracs, each of which is CA (Prop 2006). Hence every simple probability is CA.
Alternative (for "contradiction"). Assume P is CA and derive a contradiction — e.g., uniform on N leads to 1=0 or 1=+∞ (Prop 2009).
Output. "Yes, P is countably additive" (with proof) or "No, P is not countably additive" (with counterexample).
Pitfalls.
Using an increasing sequence An↑A instead of decreasing, or vice versa — the two formulations are equivalent (lect2) but the Dirac proof is cleanest on a decreasing sequence (proposition 2006).
Forgetting that countable additivity is not automatic — it must be verified for each candidate. Poisson, geometric, and all simple probabilities are CA; uniform on N is not.
M-P-7 — Check independence of events or random variables
Used on: Example 4.7, lect3 exercises.
Input. Two events A,B (or two random variables f,g) and a probability P.
Steps (events).
Compute P(A), P(B), P(A∩B) separately.
Check: P(A∩B)=?P(A)P(B).
If equal, A and B are independent; otherwise, not.
Steps (random variables).
Verify P(f≤x,g≤y)=P(f≤x)P(g≤y) for every x,y (usually equivalent to checking the joint PMF factors: P(f=yi,g=zj)=P(f=yi)P(g=zj)).
As a necessary condition, check Cov(f,g)=0 — but this is not sufficient!
Output. Yes/no with justification.
Pitfalls.
Zero covariance does not imply independence. Counterexample: let f be uniform on {−1,0,1} and g=f2. Then Cov(f,g)=E(fg)−E(f)E(g)=E(f3)−E(f)E(f2)=0−0⋅(2/3)=0, yet g is completely determined by f — they are obviously not independent. The equivalence Cov=0⇔ independence holds only for jointly normal random variables.
Mistaking disjointness for independence. Disjoint events A,B (with P(A),P(B)>0) are always dependent because P(A∩B)=0=P(A)P(B).
§6. Practice Problems with Solutions
Problem 6.1 — lect1 Q1: set function, measure, or probability?
An equity fund contains 50 stocks: Ω={T1,T2,…,T50}. Let ϕ:Ω→R send each stock to the difference between its opening and closing price on a given day. Consider μ:2Ω→R with μ(A)=∑Ti∈Aϕ(Ti). Which is correct?
(A) μ is a set function but not a measure; (B) μ is a measure but not a probability; (C) μ is a probability; (D) none of the preceding.
Solution. Check the axioms one by one.
Grounded: μ(∅)=0. ✓
Positivity: μ(A) can be negative (some stocks fell, so ϕ(Ti)<0). ✗
Additivity: for disjoint A,B, μ(A∪B)=∑Ti∈A∪Bϕ(Ti)=∑Ti∈Aϕ(Ti)+∑Ti∈Bϕ(Ti)=μ(A)+μ(B). ✓
Since positivity fails, μ is not a measure (and hence not a probability). It is a set function. Answer: (A).
Remark. If ϕ had been defined as the absolute value of the price difference, then μ would be a measure (positivity restored). But μ(Ω)=∑i=150∣ϕ(Ti)∣ need not equal 1, so it still wouldn't be a probability unless normalized.
Source: lect1, Q1 (TA handout).
Problem 6.2 — May 2024 MCQ4 (Mode A): Poisson P(n≥3)
Consider Ω=N and the Poisson probability P with parameter λ=2. The probability of E={n∈N:n≥3} is:
(A) 1−5e−2; (B) e−2/2; (C) 1−3e−2; (D) none.
Solution. See §4.1. Use the complement:
P(E)=1−P(n≤2)=1−e−2(1+2+2)=1−5e−2.Answer: (A).
Source: General_24524_ENG_SOL.pdf p.2.
Problem 6.3 — May 2024 MCQ5 (Mode A): back-solve σP(f)
Let f,g be random variables on (Ω,P) with VP(g)=100, ρP(f,g)=−0.1, and ∣CovP(2f,5g)∣=90. Then σP(f)=?
(A) 9; (B) 3; (C) 10; (D) none.
Solution. See §4.2. Bilinearity: Cov(2f,5g)=10Cov(f,g)⇒∣Cov(f,g)∣=9. Sign from ρ: Cov(f,g)=−9. σ(g)=10. Solve −0.1=−9/(10σ(f))⇒σ(f)=9. Answer: (A).
Source: General_24524_ENG_SOL.pdf p.2.
Problem 6.4 — May 2024 Q10 (Mode A): Dirac probability and countable additivity ★
(a) Define the Dirac probability over the state space Ω.
(b) Give the definition of countable additivity for a probability P:2Ω→[0,1].
(c) Prove that any Dirac probability is countably additive.
Solution.
(a)(Marinacci example 1993, §2.5.) Fix ω0∈Ω. The Dirac probability concentrated at ω0 is
δω0(E)={10if ω0∈Eif ω0∈/E∀E⊆Ω.
One checks easily that δω0 is a probability: non-negative and ∈[0,1]; δω0(Ω)=1; additive because for disjoint E1,E2, ω0 belongs to at most one of them.
(b)(Marinacci definition 2003, §2.7.)P:2Ω→[0,1] is countably additive (or σ-additive) if for every countable collection {En}n=1+∞ of pairwise disjoint events (Ei∩Ej=∅ for i=j),
P(⋃n=1+∞En)=∑n=1+∞P(En).
(c)(Proposition 2006, §3.8 — full proof.) We use the monotone-sequence characterisation: P is CA iff for every An↓A, limn→∞P(An)=P(A). Take any such decreasing sequence.
Case (I): ω0∈A=⋂n=1+∞An. Then ω0∈An for every n, so δω0(An)=1 for every n, and δω0(A)=1. Hence limn→∞δω0(An)=1=δω0(A).
Case (II): ω0∈/A. Then there exists nˉ∈N with ω0∈/Anˉ. Since An⊆Anˉ for every n≥nˉ, we have ω0∈/An for every n≥nˉ, i.e., δω0(An)=0 eventually. Also δω0(A)=0. Hence limn→∞δω0(An)=0=δω0(A).
Since limnδω0(An)=δω0(A) in both cases, δω0 is countably additive. ■
Source: General_24524_ENG_SOL.pdf p.6 (solution: "See example 1993, definition 2003, proposition 2006 (first part of the proof)").
Problem 6.5 — Probability HW: compute λ from a Poisson condition
X∼Poisson(λ) with P(X<1)=e−5. Compute (i) λ, (ii) P(X>2), (iii) V(X+5), (iv) E(X2).
(iii)V(X+5)=V(X)=λ=5 (constant shift doesn't affect variance; §3.13.2 with α=1).
(iv)V(X)=E(X2)−[E(X)]2 so E(X2)=V(X)+[E(X)]2=5+25=30.
Source: lect2 p.5 (Probability HW).
Problem 6.6 — TA1_prob_41: simple vs. countable vs. neither
Classify each of the following probabilities on Ω:
(a) Roll a fair die (Ω={1,…,6}, P({i})=1/6).
(b) Poisson(λ) on Ω=N.
(c) Uniform on Ω=N with P({n})=k for all n.
(d) Ω=N, P({0})=P({1})=P({2})=1/3, P({n})=0 for n≥3.
Solution.
(a) Simple (finite Ω, so finite support); countably additive (finite sum always equals countable sum).
(b)Not simple (every P({n})=e−λλn/n!>0, so the support is all of N, which is infinite). Countably additive — in fact this is the key example of a CA but non-simple probability (lect2 p.3).
(c)Not countably additive (Proposition 2009, §3.9). Also not simple (support is N).
(d)Simple: the finite event E={0,1,2} has P(E)=1. Countably additive (all simple probabilities are; see remark after §3.8).
Source: lect2 p.1, TA1_prob_41 p.2.
Problem 6.7 — lect7: covariance from a joint experiment
Two dice are rolled independently. Let f = result of first die, g = result of second. Compute E(f+g), V(f+g), Cov(f,g), ρ(f,g).
Solution. By independence, Cov(f,g)=0 and ρ(f,g)=0.
E(f)=E(g)=7/2; E(f+g)=E(f)+E(g)=7.
V(f)=V(g)=35/12; V(f+g)=V(f)+V(g)+2Cov(f,g)=35/6+0=35/6.
Problem 6.8 — lect6: expected value of affine function
Given E(f)=5 and E(g)=6, compute (i) E(2f+3g); (ii) E(2f+7); (iii) E(−3f+g−4).
Solution.
(i) E(2f+3g)=2E(f)+3E(g)=10+18=28.
(ii) E(2f+7)=2E(f)+7=17.
(iii) E(−3f+g−4)=−3E(f)+E(g)−4=−15+6−4=−13.
Source: lect6 p.9.
§7. Common Pitfalls
Dirac verification — must check both cases. When proving δω0 satisfies a property (additivity, CA, …), always split on "ω0∈⋅ / ω0∈/⋅". Forgetting case (II) — i.e., assuming ω0 is always in the union — leads to an incomplete proof (and loses points on May 2024 Q10c).
Poisson arithmetic — λn/n!, not nλ or λn/n.P(X=n)=e−λ⋅λn/n!. Many students confuse λn and nλ, or drop the 1/n! factor, or replace n! with n. Double-check: P(X=0)=e−λ (not 0), P(X=1)=λe−λ, P(X=2)=λ2e−λ/2.
Sign errors in correlation. When given ∣Cov∣, use ρ to recover the sign. A negative ρ forces a negative Cov. Never write σ(f)=±9 — standard deviation is always non-negative; the ± belongs to the covariance.
Cov=0 does not imply independence. Independent ⇒Cov=0, but the converse holds only for jointly normal random variables. Counterexample: f uniform on {−1,0,1}, g=f2. Cov(f,g)=0 but g is determined by f.
V(αf)=α2V(f), not αV(f). Square the coefficient. Contrast with Cov(αf,βg)=αβCov(f,g) (bilinearity). Variance scales quadratically; covariance scales bilinearly.
Additive constants don't affect variance or covariance.V(f+β)=V(f); Cov(f+β,g+δ)=Cov(f,g). Only the coefficient matters. (They do affect the expected value.)
Confusing PMF φ(y) and CDF Φ(x). PMF = "probability at a single value" (φ(y)=P(f=y)); CDF = "probability up to and including x" (Φ(x)=P(f≤x)). They satisfy Φ(x)=∑y≤xφ(y) (discrete) or Φ(x)=∫−∞xφ(t)dt (continuous).
PDF =P(f=x) in the continuous case. For a continuous random variable, P(f=x)=0 for every individual x; the density φ(x) measures probability per unit length, not probability itself. Only the integral over an interval has probabilistic meaning.
Simple probability ⇔ countably additive. Uniform on N is neither simple nor CA; Poisson is CA but not simple; rolling a fair die is both simple and CA. Memorize the 2×2 grid.
Forgetting the complement trick for tails.P(X≥k) has an infinite tail (for Poisson, etc.); use 1−P(X≤k−1) to get a finite sum. For a continuous random variable, ≥ and > give the same probability.
Monotonicity ≤ not <.A⊆B gives P(A)≤P(B) — the inequality is weak. Example (lect1): M({1})=0.5=M({1,2,3}) if M(3)=0 on a fair die with support {1,2}. Do not assume strict inequality.
Bayes' theorem: numerator must use the likelihood, not the posterior.P(E∣A)=P(A∣E)P(E)/P(A). Students sometimes write P(E∣A)=P(E∣A)P(E)/P(A), a circular error.
Distinguishing random variables from events. Events are subsets of Ω (A⊆Ω); random variables are functions f:Ω→R. P(A) is a number; E(f) is a number; f itself is a function.
Carrier = support. "Carrier" is the interval [a,b] outside which Φ(x)∈{0,1} (used for essentially bounded random variables); "support" is the set of outcomes ω∈Ω with P({ω})>0 (used for simple probabilities). Don't confuse them.
Chebyshev gives an upper bound, not an exact probability.P(∣X−μ∣≥kσ)≤1/k2 is a bound — the actual probability may be much smaller. Do not use Chebyshev as an equality.
Cross-references to the rest of the study guide
Linear algebra (§01): the covariance matrix Σf,g=(V(f)Cov(f,g)Cov(f,g)V(g)) is a symmetric 2×2 matrix; the eigenvalues determine the sign of ρ (Cauchy–Schwarz ⇔ positive semi-definite).
Integral calculus (§03): the expected value of a continuous random variable is a Stieltjes integral (§3.20); expected values of exponential and normal distributions use integration by parts (§3.24).
Differential calculus (§02): Markov/Chebyshev are non-differentiable bounds; the Gaussian PDF uses the exponential function studied in §02.
Mathematical finance (§05): the portfolio variance V(wTr)=wTΣw is a direct application of §3.15 (variance of a sum) with weights.