var graph = [{"group": "nodes", "data": {"id": "rem:AbsConf", "name": "remark", "text": "

Remark 1 (Abstract configuration space).
\n
\nMost probability problems of this course start by defining a configuration space $\\Omega$ and a probability $P$ on it. Though, many real life problems involving probabilities make no mention of the space $\\Omega$ .

Example: weather

\nAssume that we want to predict the speed of the wind $S$ in Marseille. Since we know that the speed of the wind is related to the temperature $T$ , we can attempt to predict the speed of the wind from the temperature. The speed of the wind if a vector of $\\mathbb{R}^3$ and the temperature is a positive number, hence our problem can be fully modeled by the configuration space $\\Omega_1 = \\mathbb{R}^3 \\times \\mathbb{R}_+$ . From a database of measurements, we can estimate the probability of the elementary configuration by $P_1(\\{(s,t)\\})=\\frac{n_{(s,t)}}{n}$ , where $n_{(s,t)}$ is the number of times that $(s,t)$ was measured and $n$ is the total number of measurements. With the law $P_1$ , it is possible compute the conditional probability (which we will define later) of the wind speed given the temperature. With this conditional probability, it is possible to tell what is the most probable value of wind speed for a temperature.

Now assume that temperature is a poor predictor of the speed of wind: given a temperature, the probability over the wind has a high variance. In order to improve our analysis, it is common to incorporate new parameters in our model. We could for instance consider the air pressure $Pr$ . The new configuration space would then be $\\Omega_2 = \\mathbb{R}^3 \\times \\mathbb{R}_+ \\times \\mathbb{R}_+$ , with a probability $P_2$ . With luck, the probability on the speed of wind given a temperature and a pressure has a small variance.

If not, we could incorporate more and more parameters, and consider new configuration spaces. In practice, instead of changing the configuration space each time that we want to add a parameter, we treat the parameters as random variables of a larger configuration space. At a given time, all the weather parameters are functions of the positions and speeds of particles, which could be described by the configuration space $\\Omega = (\\mathbb{R}^3 \\times \\mathbb{R}^3)^N$ , where $N$ is the number of particles. Wind speed, temperature, and air pressure are functions of the type $\\Omega \\rightarrow \\mathbb{R}^d$ and are hence random variables.

In practice, evaluating and manipulating a probability $P$ on the positions and speeds of every particles is unrealistic. Though everything that matters is the joint law of the studied variables, in our case the joint law $P_{(S,T,Pr)}$ of the random variable $(S,T,Pr):\\Omega \\rightarrow \\mathbb{R}^3 \\times \\mathbb{R}_+ \\times \\mathbb{R}_+$ . Technically speaking, $\\mathbb{R}^3 \\times \\mathbb{R}_+ \\times \\mathbb{R}_+$ is no longer referred to as a configuration space, given that additional parameters may be included in our analysis. Note that the precise form of $\\Omega = (\\mathbb{R}^3 \\times \\mathbb{R}^3)^N$ is not particularly helpful. It is only necessary to know that the relevant parameters are functions on $\\Omega$ . In this context, the term \u2019universe\u2019 to refer to $\\Omega$ is particularly appropriated.

Moral

\nIn practice, the general configuration space $\\Omega$ and its probability are often not precisely described, or even mentioned. In these cases, what matters is the joint probability law of the variables of interest.

", "parent": "sec:Prob", "rank": "0", "html_name": "rem:AbsConf", "summary": "

Remark 1 (Abstract configuration space). In practice, the general configuration space $\\Omega$ and its probability are often not precisely described, or even mentioned. In these cases, what matters is the joint probability law of the variables of interest.

", "hasSummary": true, "hasTitle": true, "title": "Abstract configuration space"}, "classes": "l0", "position": {"x": -5264.652624402453, "y": 5665.884246925858}}, {"group": "nodes", "data": {"id": "def:ConfSp", "name": "definition", "text": "

Definition 1 (Configuration space / universe).
\n

Definition

The configuration space is usually noted $\\Omega$ . As its name indicates, the set $\\Omega$ contains all the possible configurations of a probabilistic model.
Its elements are usually noted $\\omega$ , and we will call them \u2019elementary configurations\u2019.

Warning : the name of the space $\\Omega$ and its elements $\\omega$ vary across contexts ans languages. $\\Omega$ is often called the \u2019sample space\u2019 or \u2019universe\u2019, and the elements $\\omega$ \u2019samples\u2019,\u2019outcomes\u2019 or \u2019realizations\u2019.
\n

Examples (1)

\nProbabilistic models are very useful to analyse dice rolls and card games. What is a relevant configuration space

to model a dice roll ?
to model $2$ dice rolls ?
to model $n$ dice rolls ? What is the size of this configuration space?

Examples (2)

\nAssume that $\\mathcal{S}$ is a $52$ card deck and that we are interested in modeling an experiment where a player draws cards from the card game. What should we chose as configuration space when

the player draws one card ?
the player draws one card, puts it back and draws another card ?
the player draws two cards without putting them back ?

Assume a player draws $n$ cards without putting them back in the deck. What is the size of the smallest configuration space describing the possible results ?

Note that when using Cartesian products, we have $(a,b)\\neq (b,a)$ . The order between the card draws is taken into account. In card games, the order in which cards are drawn is often not important. In that case, when drawing $2$ cards, we have $(a,b)\\sim (b,a)$ .

If we do not distinguish results up to permutations, what is the size of the configuration space when

the player draws one card ?
the player draws $n$ cards without putting them back ?

Examples (3)

\nIn this course we are particularly interested in signal and image processing problems.

What are the relevant configuration spaces when studying

real signals observed on $1000$ points ?
continuous real signals of duration $1$ second ?
$256\\times 256$ color images ?

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "def:ConfSp", "summary": "

Definition 1 (Configuration space / universe). The configuration space is usually noted $\\Omega$ . As its name indicates, the set $\\Omega$ contains all the possible configurations of a probabilistic model. Its elements are usually noted $\\omega$ , and we will call them \u2019elementary configurations\u2019.

", "hasSummary": true, "hasTitle": true, "title": "Configuration space / universe"}, "classes": "l0", "position": {"x": -6591.287378112764, "y": 5808.483163577717}}, {"group": "nodes", "data": {"id": "def:Ev", "name": "definition", "text": "

Definition 2 (Events).
\n

Definition

\nAn event $A$ is a subset of the configuration space $\\Omega$ : $A\\subset \\Omega$ .
\n

Remark: a more precise definition requires the notion of $\\sigma$ -algebra. The next \u2019Remark\u2019 node gives the definition of $\\sigma$ -algebras, but the notion will not be used in the rest of the course.
\n

Examples

for the dice roll configuration space $\\Omega = \\{1,2,3,4,5,6\\}$ , a possible event is \u2019the number is even\u2019.
when $\\Omega$ is a $52$ card deck, \u2019the card is a diamond\u2019, \u2019the card is a $2$ \u2019, are typical events.
when $\\Omega=\\mathbb{R}^{1000}$ is a set of signals, the event $\\{0\\}\\times \\mathbb{R}^{999}$ describes all the signals starting with $0$ .

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "def:Ev", "summary": "

Definition 2 (Events). An event $A$ is a subset of the configuration space $\\Omega$ : $A\\subset \\Omega$ .

", "hasSummary": true, "hasTitle": true, "title": "Events"}, "classes": "l0", "position": {"x": -6569.997740292376, "y": 5276.242218068017}}, {"group": "nodes", "data": {"id": "def:Prob", "name": "definition", "text": "

Definition 3 (Probability).
\n

Definition

\nA probability $P$ is a function which takes in input an event and returns a positive real number. It must verify the following axioms:

axiom 1: $P(\\Omega) = 1$
axiom 2: $P(A\\subset \\Omega) \\geq 0$
axiom 3: if $A\\cap B=\\emptyset$ then $P(A\\cup B)=P(A)+P(B)$ (also true for countable sums)

Direct consequences:

consequence 1: $P(A^c) = 1-P(A)$
consequence 2 : $P(\\emptyset) = 0$
consequence 3: $P(A\\cup B) = P(A)+P(B)-P(A\\cap B)$

Interpretation

\nThe probability gives a notion of \u2019size\u2019 to events. Given that the size of $\\Omega$ is $1$ , the size of events can be interpreted as the proportion they occupy in $\\Omega$ . Hence, the $P$ of probability can also be read as the $P$ of proportion.
\n

Assume that a dice is a perfect cube. All the faces are indistinguishable from each other, apart from the number they carry. In that case, it is relevant to assign probabilities according to the indistinguishability principle:

$P(\\{1\\}) = \\frac{1}{6}, P(\\{2\\}) = \\frac{1}{6},P(\\{3\\}) = \\frac{1}{6},$ $P(\\{4\\}) = \\frac{1}{6},P(\\{5\\}) = \\frac{1}{6},P(\\{6\\}) = \\frac{1}{6}.$

The distribution is called uniform. It is now possible to compute the probability of every event, using the axiom 3 (if $A\\cap B=\\emptyset$ then $P(A\\cup B)=P(A)+P(B)$ ):

$P\\left(\\{2,4,6\\}\\right) = P\\left(\\{2\\}\\cup\\{4\\}\\cup \\{6\\}\\right)=P(\\{2\\})+P(\\{4\\})+P(\\{6\\})=\\frac{1}{2}.$

Assume now that our dice is a bit damaged. It is no longer perfectly symmetrical. Hence We cannot use the indistinguishability principle anymore. In that case a relevant approach is to perform many dice rolls and to assign probabilities according the frequency of apparition of number.

$P(\\{1\\}) = \\frac{k_1}{N}, P(\\{2\\}) = \\frac{k_2}{N},P(\\{3\\}) = \\frac{k_3}{N},$ $P(\\{4\\}) = \\frac{k_4}{N},P(\\{5\\}) = \\frac{k_5}{N},P(\\{6\\}) = \\frac{k_6}{N}.$

Exactly as before, the probability of other events is computed with the rule \u2019if $A\\cap B=\\emptyset$ then $P(A\\cup B)=P(A)+P(B)$ \u2019.

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "def:Prob", "summary": "

Definition 3 (Probability).

axiom 1: $P(\\Omega) = 1$
axiom 2: $P(A\\in \\Omega) \\geq 0$
axiom 3: if $A\\cap B=\\emptyset$ then $P(A\\cup B)=P(A)+P(B)$ (also true for countable sums)

", "hasSummary": true, "hasTitle": true, "title": "Probability"}, "classes": "l0", "position": {"x": -6588.173409433914, "y": 4695.7977448282445}}, {"group": "nodes", "data": {"id": "def:ProbDen", "name": "definition", "text": "

Definition 4 (Probability density).
\n
\nLet $P$ be a probability on $\\mathbb{R}$ . We say that the function $f$ is the probability density of $P$ when

$\\forall A\\in \\mathbb{R},\\quad P(A)=\\int_A f(x)\\mathrm{d}x.$

Note that $P$ does not always have a density. Take for instance $P$ such that $P(\\{0\\})=1$ : there are no functions $f$ verifying the above condition.

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "def:ProbDen", "summary": "

Definition 4 (Probability density). $f$ is the density of $P$ , a probability on $\\mathbb{R}$ , when $\\forall A\\in \\mathbb{R},\\quad P(A)=\\int_A f(x)\\mathrm{d}x.$

", "hasSummary": true, "hasTitle": true, "title": "Probability density"}, "classes": "l0", "position": {"x": -7411.798315229266, "y": 4025.4912117439035}}, {"group": "nodes", "data": {"id": "rem:IntProb", "name": "remark", "text": "

Remark 2 (Introductory remark).
\n
\nGeneral remark: in most situations the word probability refers to a proportion.
\n

For instance, the question

$\\text{\n\"what is the probability of this event?\" \n}$

can usually be interpreted as

$\\text{\n\"what is the proportion of the possible configurations in which that event occurs?\" \n}$

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "rem:IntProb", "summary": "

Remark 2 (Introductory remark). General remark: in most situations the word probability refers to a proportion. For instance, the question \"what is the probability of this event?\" can usually be interpreted as \"what is the proportion of the possible configurations in which that event occurs?\".

", "hasSummary": true, "hasTitle": true, "title": "Introductory remark"}, "classes": "l0", "position": {"x": -7804.796733874878, "y": 5776.548706847134}}, {"group": "nodes", "data": {"id": "rem:Ev", "name": "remark", "text": "

Remark 3 (Events and $\\sigma$ -algebra).
\n

Introduction

\nEarlier, events have been defined as subset $A\\subset \\Omega$ . When $\\Omega$ is not a finite set, some mathematical problems can appear, and not every subset $A\\subset \\Omega$ is considered to be an event. For instance when $\\Omega = \\mathbb{R}$ , events are subsets of $\\mathbb{R}$ which can be constructed \u2019easily\u2019 from intervals. For instance $A= \\bigcup_{i\\in \\mathbb{N}} \\left[\\frac{i}{2},\\frac{i+1}{2}\\right]$ is an event. Its probability is given by the sum of $P\\left([\\frac{i}{2},\\frac{i+1}{2}]\\right)$ . On the other hand, it is possible to define complicated set which cannot be expressed from intervals. Using the axioms of probability, we cannot compute the probability of such set from the probability of intervals. Hence we exclude them from events.
\n

Definition

\nThe events are the subsets of $\\Omega$ on which we define probability values. The set of events is called the $\\sigma$ -algebra. In order to be consistent with the calculus of probabilities, the $\\sigma$ -algebra $\\mathcal{A}$ on $\\Omega$ is required to verify the following axioms:

$\\Omega \\in \\mathcal{A}$ ( $\\rightarrow$ axiom 1 of probabilities)
$A \\in \\mathcal{A} \\Rightarrow A^{c}\\in \\mathcal{A}$ ( $\\rightarrow$ consequence 1 of probabilities)
if $A_i\\in \\mathcal{A}$ for $i\\in \\mathbb{N}$ , then $\\bigcup_{i\\in \\mathbb{N}} A_i \\in \\mathcal{A}$ ( $\\rightarrow$ axiom 3 of probabilities)

Examples

When $\\Omega$ is finite the $\\sigma$ -algebra is usually $\\mathcal{P}(\\Omega)$ : every subset of $\\Omega$ is an event.
when $\\Omega= \\mathbb{R}$ , the $\\sigma$ -algebra is obtained by taking countable unions of intervals and their complements.

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "rem:Ev", "summary": "

Remark 3 (Events and $\\sigma$ -algebra). Strictly speaking, events are not arbitrary subsets of $\\Omega$ : they must belong to the $\\sigma$ -algebra of $\\Omega$ . (Beyond the scope of this course)

", "hasSummary": true, "hasTitle": true, "title": "Events and $\\sigma$ -algebra"}, "classes": "l0", "position": {"x": -7794.151914964684, "y": 5212.373304606854}}, {"group": "nodes", "data": {"id": "rem:ProbAss", "name": "remark", "text": "

Remark 4 (Assignment of probabilities).
\nTo model real life situations, we assign probabilities to events which reflect what we know about the situation. The probabilities on the configuration space $\\Omega$ are usually assigned according to one of the two following principles.

Indistinguishability. Assume that in a certain experiment, the elements of $\\Omega$ are perfectly indistinguishable. The probability of an event $A$ is then given by the number of configuration $k$ in which it occurs ( $\\omega \\in A$ ) divided by the total number of configuration $n$ : $\\frac{k}{n}$ . It is for instance the case when we draw a card from a card game without seeing what is on the card: they are all indistinguishable.
Probabilities as frequencies. Assume that an experiment which outputs values in $\\Omega$ was repeated $n$ times. If $k$ of the outputs belongs to the event $A$ , the probability assigned to $A$ is the number $\\frac{k}{n}$ . This can be used to evaluate the probabilities of the outcomes of an asymmetrical dice. We will come back to the link between probabilities and frequencies later in the course.

Independently from how they are assigned, probability values should verify several rules.

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "rem:ProbAss", "summary": "

Remark 4 (Assignment of probabilities). Probability values are usually assigned either based either on an indistinguishability criterion, or from observed frequencies of a repeated experiment.

", "hasSummary": true, "hasTitle": true, "title": "Assignment of probabilities"}, "classes": "l0", "position": {"x": -7826.086371695265, "y": 4701.421996917542}}, {"group": "nodes", "data": {"id": "ex:RemSet", "name": "exercise", "text": "

Exercise 1 (Reminders of set theory).
\n
\n

Enumerate all the elements of $\\{1,2,3\\}$ .
Enumerate all the subsets of $\\{1,2,3\\}$ .
Recall what is the Cartesian product of $2$ sets.
Enumerate all the elements of $\\{1,2\\}^2$ .
Enumerate all the subsets of $\\{1,2\\}^2$ .
Enumerate all the elements of $\\{1\\}\\times \\{1,2\\}$
Enumerate all the subsets of $\\{1\\}\\times \\{1,2\\}$

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "ex:RemSet", "summary": "

Exercise 1 (Reminders of set theory).

", "hasSummary": false, "hasTitle": true, "title": "Reminders of set theory"}, "classes": "l0", "position": {"x": -8929.518068117426, "y": 5857.966875248107}}, {"group": "nodes", "data": {"id": "ex:RemCom", "name": "exercise", "text": "

Exercise 2 (Reminders on combinatorics).
\n
\n

Q1: permutations

\nLet $\\mathcal{S}$ be a set of $n$ elements. We call $\\sigma:\\mathcal{S}\\rightarrow \\mathcal{S}$ a permutation when is it a bijection. How many different permutation $\\sigma$ can we construct?
\n

Q2: drawing objects from a box

\nAssume that elements of $\\mathcal{S}$ are $n$ physical objects contained in a box. Draw successively $k$ element from the box, without looking in the box, and without putting the element back before taking the next one. If we remember the order in which we take the elements, how many different configurations can we obtain? And if we don\u2019t remember the order in which we took the different elements?

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "ex:RemCom", "summary": "

Exercise 2 (Reminders on combinatorics).

", "hasSummary": false, "hasTitle": true, "title": "Reminders on combinatorics"}, "classes": "l0", "position": {"x": -8928.41412074155, "y": 5434.540693704344}}, {"group": "nodes", "data": {"id": "ex:Gauss", "name": "exercise", "text": "

Exercise 3 (Gaussian densities ( $1D$ )).
\n
\n

What condition must a function $f:\\mathbb{R}\\rightarrow \\mathbb{R}$ verify to be interpreted as a probability density?
\nIt can be proved that the standardized Gaussian function has integral $1$ : $\\int_{-\\infty}^{+\\infty}\\frac{1}{\\sqrt{2\\pi}}e^{\\frac{-x^2}{2}}dx=1.$
What can we say about $f_{\\mu,\\sigma}(x)=\\frac{1}{\\sqrt{2\\pi \\sigma^2}}e^{\\frac{-(x-\\mu)^2}{2\\sigma^2}} \\quad?$

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "ex:Gauss", "summary": "

Exercise 3 (Gaussian densities ( $1D$ )).

", "hasSummary": false, "hasTitle": true, "title": "Gaussian densities ( $1D$ )"}, "classes": "l0", "position": {"x": -8921.777124083159, "y": 4104.367513229578}}, {"group": "nodes", "data": {"id": "def:RV", "name": "definition", "text": "

Definition 5 (Random variables).
\n

Introduction

\nIntroduce first the idea behind the definition. Build a configuration space representing the possible ages of $2$ persons. A natural choice is $\\Omega = \\mathbb{R} \\times \\mathbb{R}$ where the first coordinate represents the possible ages of the first person,and the second the possible age of the second person ( $\\Omega = \\mathbb{R}_+ \\times \\mathbb{R}_+$ is also a natural choice).
\n

The coordinate function $age_1:\\Omega \\rightarrow \\mathbb{R}$ ,

$age_1\\left(\\omega=(\\omega_1,\\omega_2)\\right)=\\omega_1$

\u2019extracts\u2019 the age of the first person out of an elementary configuration $\\omega$ . Similarly, the coordinate function $age_2:\\Omega \\rightarrow \\mathbb{R}$ ,

$age_2\\left(\\omega=(\\omega_1,\\omega_2)\\right)=\\omega_2$

\u2019extracts\u2019 age of the second person. The functions $age_1$ and $age_2$ are called random variables.
\nThe coordinates functions $age_1$ and $age_2$ extracts interesting quantities from the elementary configuration. However, there are other interesting quantities which are not coordinate functions. For instance, for an elementary configuration $\\omega$ , we can be interested in the average age of the $2$ persons. We can define the function $average:\\Omega \\rightarrow \\mathbb{R}$ ,

$average\\left(\\omega=(\\omega_1,\\omega_2)\\right) = \\frac{\\omega_1+\\omega_2}{2}$

The function $average$ is also called a random variable.
\n

Definition

\nA random variable $X$ taking values in a space $E$ is a function $X:\\Omega\\rightarrow E$ . In particular, integer random variables are functions $X:\\Omega\\rightarrow \\mathbb{N}$ and real random variable are functions $X:\\Omega\\rightarrow \\mathbb{R}$ .
\n

Remark: this is a simplified version of the true definition which requires that random variables are \u2019measurable functions\u2019. When $A$ is in the $\\sigma$ -algebra of $E$ , $X^{-1}(A)$ should be in the $\\sigma$ -algebra of $\\Omega$ .
\n

Example 1

\nConsider the following dice roll game. If the number is even, the player gains $1$ euro, and if the number is odd he loses $1$ euro. The function $Gain:\\Omega=\\{1,2,3,4,5,6\\}\\rightarrow \\{-1,1\\}$ , $G(1)=G(3)=G(5)=-1$ $G(2)=G(4)=G(6)=1$ is a random variable.

Example 2: coordinate random variables

\nAssume that we roll a dice $n$ times. The configurations space that describes all the possible results is $\\Omega = \\{1,2,3,4,5,6\\}^n$ . Let $X_i$ be the $i$ -th coordinate function on $\\Omega$ . For instance,

$X_1\\left( (3,2,3,...,6) \\right) = 3.$ These random variables \u2019extract\u2019 the result of the $i$ -th roll out of the elementary configuration. We can also construct the random variables $Gain_i = Gain \\circ X_i$ : the gain at the $i$ -th roll.

", "parent": "subsec:RV", "rank": "0", "html_name": "def:RV", "summary": "

Definition 5 (Random variables). A random variable $X$ is a function defined on the configuration space $\\Omega$ . A integer random variable is valued in $\\mathbb{N}$ or $\\mathbb{Z}$ and a real random variable is valued in $\\mathbb{R}$ .

", "hasSummary": true, "hasTitle": true, "title": "Random variables"}, "classes": "l0", "position": {"x": -3993.986179509592, "y": 3935.377851810639}}, {"group": "nodes", "data": {"id": "def:RVLaw", "name": "definition", "text": "

Definition 6 (Law of a random variables).
\n

Introduction

\nLet $\\Omega$ be a configuration space with a probability $P$ , and $X$ be a random variable taking values in a set $E$ . Recall that $P$ is a function that takes in input an event of $\\Omega$ and associates its probability. The random variable transports the probability $P$ to the set $E$ . The probability of a set $A$ in $E$ is determined by the probability of the inverse image of $A$ in $\\Omega$ .
\n

Definition

\nThe law of $P_X$ of the random variable $X:\\Omega \\rightarrow E$ is a probability on $E$ defined by $P_X(A \\subset E) = P\\left(X^{-1}(A)\\right).$ It is called the law of the random variable $X$ .
\nRemarks:

instead of $P_X(A)$ , we often write $P(X\\in A)$ (the probability that the value of the random variable $X$ falls in $A$ ).
$E$ can be viewed as another configuration space with a probability $P_X$

In the previous game, the random variable $Gain:\\Omega=\\{1,2,3,4,5,5\\}\\rightarrow \\{-1,1\\}$ , $Gain(1)=Gain(3)=Gain(5)=-1$ $Gain(2)=Gain(4)=Gain(6)=1$ induces a probability distribution on $\\{-1,1\\}$ . We have $P_{Gain}(\\{-1\\}) = P(Gain^{-1}(\\{-1\\}))=P(\\{1\\})+P(\\{3\\})+P(\\{5\\})$ $P_{Gain}(\\{1\\}) = P(Gain^{-1}(\\{1\\}))=P(\\{2\\})+P(\\{4\\})+P(\\{6\\}).$

Definition: cumulative distribution

\nLet $X$ be a real random variable ( $E=\\mathbb{R}$ ). The probability $P_X$ can be represented by its cumulative distribution $F_X:\\mathbb{R}\\rightarrow \\mathbb{R}$ , defined by

$F_X(a) = P_X\\left(]-\\infty,a]\\right).$

All the information on $P_X$ is contained in $F_X$ . Since the cumulative distribution is a function $\\mathbb{R}\\rightarrow \\mathbb{R}$ , sometimes it is conceptually simpler to manipulate $F_X$ than the probability itself which is a function from the subsets of $\\mathbb{R}$ to $\\mathbb{R}$ .

How do we compute $P_X(]a,b])$ from $F_X$ ?
What do we know about $\\lim_{x\\rightarrow \\infty} F_X(x)$ ?
Can we have $x<y$ and $F_X(x)>F_X(y)$ ?
If for $x\\in [0,1]$ , $F_X(x)=x$ , what is $P_X([2,3])$ ?

Definition: density of a real random variable

\nLet $X$ be a real random variable. Sometimes, the law of $X$ admits a density. When it exists, the density of $X$ is a function $f_X$ such that

$P_X([a,b]) = \\int_a^b f_X(x)dx.$

Remark: we defined the cumulative distribution and density of a \u2019random variable\u2019 but note that it depends on $X$ only through the probability $P_X$ . Hence the density can be defined for a probability $P$ on $\\mathbb{R}$ even if it is not defined as the law of a random variable.

Exercises

\nFor a real random variable $X$ ,

show that $\\int_{-\\infty}^{+\\infty}f_X(x)dx=1$
show that $f_X=(F_X)'$ .

", "parent": "subsec:RV", "rank": "0", "html_name": "def:RVLaw", "summary": "

Definition 6 (Law of a random variables). A RV $X:\\Omega \\rightarrow E$ transports the probability on $\\Omega$ to a probability on $E$ . For $A$ an event in $E$ , define $P_X(A) = P\\left(X^{-1}(A)\\right).$ $P_X$ is called the law of the random variable $X$ .

", "hasSummary": true, "hasTitle": true, "title": "Law of a random variables"}, "classes": "l0", "position": {"x": -2761.1231158163064, "y": 3870.8379779883944}}, {"group": "nodes", "data": {"id": "def:JoinRV", "name": "definition", "text": "

Definition 7 (Joint random variables).
\n

Definition

\nConsider $2$ random variables $X:\\Omega \\rightarrow E$ and $Y:\\Omega \\rightarrow F$ . The random variable $Z:\\Omega \\rightarrow E\\times F$ ,

$Z=(X,Y)$

is called the joint random variable. The law $P_{Z}=P_{(X,Y)}$ of $Z$ is called the joint law (or joint probability distribution).
\n

Examples

\nConsider the case of $n$ dice rolls. Let $\\Omega = \\{1,2,3,4,5,6\\}^n$ be a configuration space, with the uniform probability distribution $P$ . Let $X_i$ be the $i$ -th coordinate function. Consider the joint random variable

$Z = (X_1,..,X_n).$

What the space of values of $Z$ ? What is the law of $Z$ ?

Consider now $Z:\\Omega \\rightarrow \\{1,2,3,4,5,6\\}^2,\\quad Z=(X_1,X_2).$

What is the law of $Z$ ?

", "parent": "subsec:RV", "rank": "0", "html_name": "def:JoinRV", "summary": "

Definition 7 (Joint random variables). Consider $2$ random variables $X:\\Omega \\rightarrow E$ and $Y:\\Omega \\rightarrow F$ . The random variable $Z:\\Omega \\rightarrow E\\times F$ , $Z=(X,Y)$ is called the joint random variable. The law $P_{Z}=P_{(X,Y)}$ of $Z$ is called the joint law (or joint probability distribution).

", "hasSummary": true, "hasTitle": true, "title": "Joint random variables"}, "classes": "l0", "position": {"x": -5158.705568517556, "y": 3888.000422320245}}, {"group": "nodes", "data": {"id": "ex:SpGM", "name": "exercise", "text": "

Exercise 4 (Simple game model).
\n
\nConsider the following game. A player flips twice a coin. If both flips are heads, the player wins $1$ euro. Otherwise he loses $1$ euro. Make a probabilistic model which describes the game.

", "parent": "subsec:RV", "rank": "0", "html_name": "ex:SpGM", "summary": "

Exercise 4 (Simple game model).

", "hasSummary": false, "hasTitle": true, "title": "Simple game model"}, "classes": "l0", "position": {"x": -2755.6188300699455, "y": 4768.891256628461}}, {"group": "nodes", "data": {"id": "ex:ImRV", "name": "exercise", "text": "

Exercise 5 (Image as random variable).
\n
\nLet $I$ be a $512\\times 512$ image valued in $\\{0,1,...,255\\}$ . $I$ can be viewed as a random variable from the configuration space $\\Omega = \\{0,1,...,511\\}^2\\rightarrow \\{0,1,...,255\\}$ . Assume that $P$ is a uniform probability distribution over $\\Omega$ . The law $P_I$ of $I$ corresponds to something that we already meet in the image processing course. What is it? What is the formal difference?

", "parent": "subsec:RV", "rank": "0", "html_name": "ex:ImRV", "summary": "

Exercise 5 (Image as random variable).

", "hasSummary": false, "hasTitle": true, "title": "Image as random variable"}, "classes": "l0", "position": {"x": -2763.4777060587226, "y": 4388.939429960873}}, {"group": "nodes", "data": {"id": "ex:SumEq01", "name": "exercise", "text": "

Exercise 6 (Sums of equiprobable 0s and 1s).
\n
\n

Theory

\nConsider the configuration space $\\Omega = \\{ 0,1 \\}^n$ ( $\\Omega$ can be interpreted as the vertices of an hypercube), with the uniform probability distribution. What is the cardinal of $\\Omega$ ?
\nConsider now the random variable $X:\\Omega \\rightarrow \\{0,..,n\\}$ , $X((\\omega_1,...,\\omega_n)) = \\sum_{i=1}^n \\omega_i,$ which counts the number of $1$ in each $n$ -tuple. What is the law of the random variable $X$ ?

Application: probability of a binary image

\nPropose a very simple probabilistic model over the set of binary images of size $256\\times 256$ . What can we say about the probability that

an image contains only $1$ s (or $0$ s)?
it contains exactly $1$ time the number $1$ ?
the number of $1$ it contains is exactly $k$ ?
the number of $1$ s is smaller or equal to $k$ ?

", "parent": "subsec:RV", "rank": "0", "html_name": "ex:SumEq01", "summary": "

Exercise 6 (Sums of equiprobable 0s and 1s).

", "hasSummary": false, "hasTitle": true, "title": "Sums of equiprobable 0s and 1s"}, "classes": "l0", "position": {"x": -3966.066245820806, "y": 4379.791652955868}}, {"group": "nodes", "data": {"id": "ex:SimJoin", "name": "exercise", "text": "

Exercise 7 (Simple joint law).
\n
\nConsider $\\Omega=\\{0,1\\}^2$ with a uniform probability, and $X_1$ and $X_2$ the coordinate random variables. Let $Y$ and $Z$ be two random variables $\\Omega \\rightarrow \\{0,1,2\\}$ defined by

$Y = X_1 + X_2 \\text{ and } Z = X_1X_2.$

Give the laws of $Y$ , $Z$ , and the joint law of $(Y,Z)$ .

", "parent": "subsec:RV", "rank": "0", "html_name": "ex:SimJoin", "summary": "

Exercise 7 (Simple joint law).

", "hasSummary": false, "hasTitle": true, "title": "Simple joint law"}, "classes": "l0", "position": {"x": -5140.737592553508, "y": 4382.769647635065}}, {"group": "nodes", "data": {"id": "def:Marg", "name": "definition", "text": "

Definition 8 (Concept of marginal).
\n
\n

Coordinate projection

\nBy definition, elements of $E\\times F$ are couples $(x,y)$ with $x\\in E$ and $y\\in F$ . Note $\\pi_E$ and $\\pi_F$ the maps which map $(x,y)$ to its first and second coordinate,

$\\pi_E((x,y))=x \\text{ and } \\pi_F((x,y))=y.$

$\\pi_E$ and $\\pi_F$ are called projections on $E$ and $F$ .

Marginal: idea

\nSome mathematical objects (probabilities and random variables) defined on the Cartesian product $E\\times F$ naturally give rise to similar objects defined on $E$ or $F$ by projecting them on $E$ or $F$ , in a sens that will be made clear in subsequent nodes. The objects defined in this manner on $E$ and $F$ are called marginals of the object on $E\\times F$ .
\n

By projecting on a coordinate, we \u2019forget\u2019 the other one. As we will see, \u2019forgetting\u2019 the other coordinate is opposed to conditioning, where the other coordinate is fixed to a particular value.

", "parent": "subsec:Marg", "rank": "0", "html_name": "def:Marg", "summary": "

Definition 8 (Concept of marginal). Some mathematical objects defined on the Cartesian product $E\\times F$ naturally give rise to similar objects defined on $E$ and $F$ by \u2019forgetting\u2019 the other coordinate. The objects defined in this manner on $E$ and $F$ are called marginal.

", "hasSummary": true, "hasTitle": true, "title": "Concept of marginal"}, "classes": "l0", "position": {"x": -7666.146928631908, "y": 3152.463522697209}}, {"group": "nodes", "data": {"id": "def:MargProb", "name": "definition", "text": "

Definition 9 (Marginal probability).
\n
\n

Definition

\nLet $P_{E\\times F}$ be a probability on the Cartesian product $E\\times F$ . The marginal probability on $E$ is defined by

$P_E(A) = P_{E\\times F}(\\pi_E^{-1}(A)) = P_{E\\times F}(A,F)$

and the marginal on $F$ by

$P_F(B) = P_{E\\times F}(\\pi_F^{-1}(B)) = P_{E\\times F}(E,B).$

Particular cases

When $F$ is discrete, $P_E(A)=P_{E\\times F}(A,F) = \\sum_{f_i\\in F} P_{E\\times F}(A\\times \\{ f_i\\})$
when densities exists, $f_E(x) = \\int_{x\\times F} f_{E\\times F}(x,y)\\mathrm{d}y$

", "parent": "subsec:Marg", "rank": "0", "html_name": "def:MargProb", "summary": "

Definition 9 (Marginal probability). Let $P_{E\\times F}$ be a probability on the Cartesian product $E\\times F$ . The marginal probability on $E$ is defined by $P_E(A) = P_{E\\times F}(A,F).$

", "hasSummary": true, "hasTitle": true, "title": "Marginal probability"}, "classes": "l0", "position": {"x": -7574.774146852916, "y": 2525.6983691605765}}, {"group": "nodes", "data": {"id": "def:MargRV", "name": "definition", "text": "

Definition 10 (Marginal random variables).
\n
\n

Definition

\nConsider a random variable $Z:\\Omega \\rightarrow E \\times F$ . The marginal random variables $X$ and $Y$ on $E$ and on $F$ are defined by

$X = \\pi_E \\circ Z \\text{ and } Y = \\pi_F \\circ Z.$

The laws of marginal variables

\nThe laws of $X$ and $Y$ are called the marginal laws. They are of course the marginals of the joint probability $P_{Z}$ (prove it). The notations $P_E$ and $P_F$ introduced in the general case are replaced by $P_X$ and $P_Y$ . When the density exist, they are noted $f_X$ and $f_Y$ .
\n

Examples

The marginals of the joint variable $(X_1,X_2)$ are $X_1$ and $X_2$ .
\n
Consider the configuration space of $n$ dice rolls $\\Omega=\\{1,2,3,4,5,6\\}^n$ , with a probability $P$ . The law of the $i$ -th coordinate random variable $X_i$ is a marginal of $P$ .
\n
In the same flavor, consider a configuration space describing real signals observed on $n$ points: $\\Omega = \\mathbb{R}^{1000}$ , with a probability $P$ . The value of the signal at the $i$ -th measurement is also a coordinate random variable and its law is a marginal of $P$ .

", "parent": "subsec:Marg", "rank": "0", "html_name": "def:MargRV", "summary": "

Definition 10 (Marginal random variables). Let $Z:\\Omega\\rightarrow E\\times F$ be a random variable, and let $X$ and $Y$ be the random variables corresponding to each coordinate: $Z=(X,Y).$ $X$ is called the marginal random variable on $E$ .

", "hasSummary": true, "hasTitle": true, "title": "Marginal random variables"}, "classes": "l0", "position": {"x": -8825.065439679365, "y": 2569.562259332014}}, {"group": "nodes", "data": {"id": "ex:SimPExa", "name": "exercise", "text": "

Exercise 8 (A simple example).
\n
\nConsider the configuration space $\\Omega = \\{a,b\\}^2$ ,

\n\n\n\n\n\n\n\n\n\n\n\n\n\n

(a,a)	(a,b)
(b,a)	(b,b)

With probabilities
\n

\n\n\n\n\n\n\n\n\n\n\n\n\n\n

$P(a,a)=0.02$	$P(a,b)=0.08$
$P(b,a)=0.48$	$P(b,b)=0.42$

Note $X_1$ and $X_2$ the coordinate random variables on $\\Omega$ .

Explain why the notation is not entirely rigorous.
Compute the marginal laws of $X_1$ and $X_2$ .
If you know the laws of $X_1$ and $X_2$ , can you recover the law of the joint distribution ?

", "parent": "subsec:Marg", "rank": "0", "html_name": "ex:SimPExa", "summary": "

Exercise 8 (A simple example).

", "hasSummary": false, "hasTitle": true, "title": "A simple example"}, "classes": "l0", "position": {"x": -10021.854885608795, "y": 3276.801731567818}}, {"group": "nodes", "data": {"id": "ex:MargGauss", "name": "exercise", "text": "

Exercise 9 (Marginals of $2D$ Gaussians).
\n
\nLet $\\Omega = \\mathbb{R}^2$ and let $P$ be a probability with density

$f(x,y)=\\frac{1}{2\\pi \\sigma^2 }e^{-\\frac{(x-x_0)^2+(y-y_0)^2}{2\\sigma^2}}.$

Draw approximately some level lines of $f$ .
Show that $f$ is a probability density using the results of Exercise\u00a03
Recall the definition of the marginal distribution on the first coordinate.
Compute the marginal densities of each coordinate.

", "parent": "subsec:Marg", "rank": "0", "html_name": "ex:MargGauss", "summary": "

Exercise 9 (Marginals of $2D$ Gaussians).

", "hasSummary": false, "hasTitle": true, "title": "Marginals of $2D$ Gaussians"}, "classes": "l0", "position": {"x": -10013.868622442598, "y": 2864.2827750796514}}, {"group": "nodes", "data": {"id": "ex:LandSp", "name": "exercise", "text": "

Exercise 10 (Landing a spaceship).
\n
\n

Error function

\nThe following function $erf(z)=\\frac{2}{\\sqrt{\\pi}} \\int_{0}^{z} e^{-x^2}dx$

is called the \u2019Gaussian error function\u2019. Express this integral,

$\\int_{\\mu-\\sigma}^{\\mu+\\sigma} f_{\\mu,\\sigma}(x)dx,$

with the $erf$ function. Ask google its value.

Application

\nAssume that a spaceship wants to land on earth at a certain location. Assume that in the region of the landing zone the earth surface is assimilated to a plan. Given $2$ coordinate axis, points on the surface of the earth can be identified to points of $\\mathbb{R}^2$ . The spaceship aims at the point of coordinate $(x_0,y_0)$ , but the wind introduce a perturbation on the landing point. In about $70$ % of landings the perturbation on each coordinate is smaller than $1$ meters.
\nPropose a reasonable probabilistic modelization of the landing problem in which, $\\Omega$ is an abstract space not entirely specified, and the landing position is a random variable.

", "parent": "subsec:Marg", "rank": "0", "html_name": "ex:LandSp", "summary": "

Exercise 10 (Landing a spaceship).

", "hasSummary": false, "hasTitle": true, "title": "Landing a spaceship"}, "classes": "l0", "position": {"x": -10013.868622442598, "y": 2480.1339980091607}}, {"group": "nodes", "data": {"id": "th:Bayes", "name": "theorem", "text": "

Theorem 1 (Bayes Theorem).
\n
\nThe definition of conditional densities can be rewritten as something called the \"Bayes theorem\":

$P(B|A)P(A) = P(A|B)P(B) = P(A\\cap B)$

", "parent": "subsec:Cond", "rank": "0", "html_name": "th:Bayes", "summary": "

Theorem 1 (Bayes Theorem). $P(B|A)P(A) = P(A|B)P(B) = P(A\\cap B)$

", "hasSummary": true, "hasTitle": true, "title": "Bayes Theorem"}, "classes": "l0", "position": {"x": -3698.7342091408714, "y": 2127.4049655263498}}, {"group": "nodes", "data": {"id": "def:CondProb", "name": "definition", "text": "

Definition 11 (Conditional probability).
\n
\n

Idea

\nConsider a probability $P$ on the set of configurations $\\Omega$ . Given a event $A$ with $P(A>0)$ , the idea of conditioning with respect to $A$ is to focus on the configurations $\\omega \\in A$ and forget about the configurations $\\omega \\notin A$ . We would like to define a new probability which respects the proportions of the events included in $A$ and gives zero probability to event which do not intersect $A$ .
\n

Definition

\nLet $A$ be an event of $\\Omega$ , with $P(A)>0$ . We define the probability conditional to the event $A$ by $P_A(B) = \\frac{P(B\\cap A)}{P(A)}.$ $P_A(B)$ is called the probability of $B$ given $A$ , or probability of $B$ knowing $A$ , and is usually noted $P(B|A)$ .
\nExercise: check that $P_A$ is a probability on $\\Omega$
\n

Examples: cards

\nAssume that $\\Omega$ is the set of cards of a standard 52 card deck, $\\begin{aligned}\n \\Omega & = \\{1,2,3,4,5,6,7,8,9,10,jack,queen,king,ace\\}\\\\\n & \\times \\{clubs,diamonds,hearts,spades\\}\\\\\n \\end{aligned}$ and $P$ is the uniform distribution.

What is the probability of then event $\\{(King,hearts)\\}$ knowing $\\{(Queen,diamond)\\}$ ?
What is the probability of $\\{(King,hearts)\\}$ knowing $\\{1,2,3,4,5,6,7,8,9,10,jack,queen,king,ace\\}\\times \\{hearts\\}$ ?
What is the probability of $\\{(King,hearts)\\}$ knowing $\\{king\\}\\times \\{clubs,diamonds,hearts,spades\\}$ ?

Examples: dice

\nAssume that a dice is rolled twice and that we have a uniform distribution $P$ on $\\Omega=\\{1,..,6\\}^2$ .

what is the probability that the second roll is $6$ given that the first roll is $1$ ?
what is the probability that the sum of the rolls is $8$ given that the first roll is $1$ ?

", "parent": "subsec:Cond", "rank": "0", "html_name": "def:CondProb", "summary": "

Definition 11 (Conditional probability). We define the probability conditional to the event $A$ by $P_A(B) = \\frac{P(B\\cap A)}{P(A)}.$ $P_A(B)$ is called the probability of $B$ given $A$ , and is usually noted $P(B|A)$ .

", "hasSummary": true, "hasTitle": true, "title": "Conditional probability"}, "classes": "l0", "position": {"x": -4871.671100695654, "y": 2254.936653287442}}, {"group": "nodes", "data": {"id": "def:CondCart", "name": "definition", "text": "

Definition 12 (Conditional probabilities on a Cartesian product).
\n
\n

Idea

\nLet $P$ be a probability on a Cartesian product $E\\times F$ . Assume that the marginal $P_F(\\{y\\})$ is non null. We can condition $P$ by the event $E\\times \\{y\\}$ , \u2019the second coordinate is $y$ \u2019. This gives a new probability on $E\\times F$ , which can be interpreted as a probability on $E$ as follow.

Definition

\nAssume that $P(E\\times \\{y\\})>0$ . For $A\\subset E$ ,

$\\begin{aligned}\nP(A | E\\times \\{y\\} ) &\\leftarrow& P(A\\times \\{y\\} | E\\times \\{y\\} )\\\\\n&=& \\frac{P((A\\times \\{y\\})\\cap (E\\times \\{y\\}))}{P(E\\times \\{y\\})}\\\\\n&=& \\frac{P((A\\cap E)\\times \\{y\\})}{P_F(\\{y\\})}\\\\\n&=& \\frac{P(A\\times \\{y\\})}{P_F(\\{y\\})}\\\\\\end{aligned}$

When all the densities exist, the conditional density at $x\\in E$ is $f(x|E\\times \\{y\\}) = \\frac{f((x,y))}{f_F(y)},$ as long as the marginal density $f_F(y)>0$ .
\n

Hence, in general, when conditioning a joint law by a coordinate value, we have

$\\text{conditional probability} = \\frac{\\text{joint probability}}{ \\text{marginal probability}}.$

Conditional probabilities of a joint law

\nAssume now $X:\\Omega \\rightarrow E$ and $Y:\\Omega \\rightarrow F$ . The joint law $P_{(X,Y)}$ , is a probability on $E\\times F$ . In this context we use particular notations to condition on some $y\\in F$ . When $A\\subset E$ ,

$P_{(X,Y)}(A|E\\times \\{y\\}) \\text{ is written } P(X\\in A |Y=y).$

Using shortened notation, we can write

$P(X\\in A |Y=y) \\rightarrow P(A|y)= \\frac{P(A,y)}{P(y)}.$

And when it exists, the conditional density is noted $f(x|Y=y) = \\frac{f_{(X,Y)}(x,y)}{f_Y(y)}.$

", "parent": "subsec:Cond", "rank": "0", "html_name": "def:CondCart", "summary": "

Definition 12 (Conditional probabilities on a Cartesian product). Let $P$ be a probability on $E\\times F$ . The conditional probability knowing $\\{y\\}\\subset F$ is given by $P(A|E\\times \\{y\\}) = \\frac{P(A\\times \\{y\\})}{P_F(\\{y\\})},$ where $A$ is an event of $E$ .

", "hasSummary": true, "hasTitle": true, "title": "Conditional probabilities on a Cartesian product"}, "classes": "l0", "position": {"x": -4842.334127720824, "y": 1526.3454399792772}}, {"group": "nodes", "data": {"id": "rem:CondCart", "name": "remark", "text": "

Remark 5 (Conditioning on a Cartesian product).
\n
\nEarlier we considered the case of $2$ dice rolls, $\\Omega = \\{1,..,6\\}^2$ with the uniform distribution $P$ .

Conditioning by the event \u2019the first roll is $1$ \u2019 gives a new probability on $\\Omega$ . Strictly speaking $P_{\\{1\\}\\times \\{1,..,6\\}}$ is a distribution on $\\Omega$ .
Observe that $P_{\\{1\\}\\times \\{1,..,6\\}}(\\{ (i,j) \\})$ is null when $i\\neq 1$ . Non null values are obtained for $P_{\\{1\\}\\times \\{1,..,6\\}}(\\{ (1,j) \\})$ when $j$ varies in $\\{1,2,3,4,5,6\\}$ .
Hence, it is tempting, and possible, to interpret the probability $P_{\\{1\\}\\times \\{1,..,6\\}}$ as a probability on $\\{1,..,6\\}$ . In general, this is the case when the space is a Cartesian product and when the probability is conditioned by fixing one coordinate.

", "parent": "subsec:Cond", "rank": "0", "html_name": "rem:CondCart", "summary": "

Remark 5 (Conditioning on a Cartesian product). Considered the case of $2$ dice rolls with the uniform distribution. Conditioning by \u2019the first roll is a $1$ \u2019 leads to a probability distribution on the second roll.

", "hasSummary": true, "hasTitle": true, "title": "Conditioning on a Cartesian product"}, "classes": "l0", "position": {"x": -3705.4533780122365, "y": 1581.3149748236115}}, {"group": "nodes", "data": {"id": "ex:RCRD", "name": "exercise", "text": "

Exercise 11 (Random card in random deck).
\n
\nConsider that a box contains $10$ card decks. $3$ decks contain $52$ cards (2 - Ace) and $7$ decks contain $32$ cards (7-Ace). Without watching, a deck is chosen in the box, and a card is chosen in the deck. Given that the card is a $10$ , what is the probability that it comes from a $52$ card deck ?

", "parent": "subsec:Cond", "rank": "0", "html_name": "ex:RCRD", "summary": "

Exercise 11 (Random card in random deck).

", "hasSummary": false, "hasTitle": true, "title": "Random card in random deck"}, "classes": "l0", "position": {"x": -2522.109017729677, "y": 2345.4030045895497}}, {"group": "nodes", "data": {"id": "ex:DWRPM", "name": "exercise", "text": "

Exercise 12 (Draws without replacement (probabilistic model)).
\n
\nThis exercise is similar to the question 2 of Exercise\u00a02 \u2019Reminder on combinatorics\u2019, but we will re-express our previous reasonings in language of probabilities and random variables.
\n

Consider again that elements of $\\mathcal{S}$ are $n$ physical objects contained in a box. Assume that we draw successively $k$ element from the box, without looking in the box, and without putting the element back before taking the next one. We want to now build a probabilistic model of the possible draws. We will model the experiment with the configuration space $\\Omega = \\mathcal{S}^k$ and call $X_i$ the coordinate random variables.

Give a singleton $\\{ \\omega \\} \\subset\\Omega$ which must have a null probability.
Give the law of $X_1$ .
Give the law of $X_2$ , given $X_1$ . Is there independence? Compute the joint law of $(X_1,X_2)$ and the marginal law of $X_2$ .
Given the law of $X_k$ , given $X_1,...,X_{k-1}$ .
Express the probability $P$ on $\\Omega$ , from the law of $X_1$ and successive conditioning. What is the probability of a singleton $\\{ \\omega \\} \\subset\\Omega$ which has a non null probability?

", "parent": "subsec:Cond", "rank": "0", "html_name": "ex:DWRPM", "summary": "

Exercise 12 (Draws without replacement (probabilistic model)).

", "hasSummary": false, "hasTitle": true, "title": "Draws without replacement (probabilistic model)"}, "classes": "l0", "position": {"x": -2532.2100549601205, "y": 1876.5173886765588}}, {"group": "nodes", "data": {"id": "ex:CondGauss", "name": "exercise", "text": "

Exercise 13 (Conditionals of $2D$ Gaussians).
\n
\nAgain, let $\\Omega = \\mathbb{R}^2$ and let $P$ be a probability with density

$f(x,y)=\\frac{1}{2\\pi \\sigma^2 }e^{-\\frac{(x-x_0)^2+(y-y_0)^2}{2\\sigma^2}}.$

Give the conditional density on the first coordinate given that the second coordinate has a fixed value $y$ .

", "parent": "subsec:Cond", "rank": "0", "html_name": "ex:CondGauss", "summary": "

Exercise 13 (Conditionals of $2D$ Gaussians).

", "hasSummary": false, "hasTitle": true, "title": "Conditionals of $2D$ Gaussians"}, "classes": "l0", "position": {"x": -2536.920292020898, "y": 1410.897130613792}}, {"group": "nodes", "data": {"id": "th:JLIV", "name": "theorem", "text": "

Theorem 2 (Joint law of independent variables).
\n
\n

Theorem

\nLet $X:\\Omega \\rightarrow \\mathbb{Z}$ and $Y:\\Omega \\rightarrow \\mathbb{Z}$ be independent RV.

Let $P_{(X,Y)}$ be the joint law on $\\mathbb{Z}^2$ of the joint variable $(X,Y)$ .
\n
$\\forall (i,j)\\in \\mathbb{Z}^2, \\quad P_{(X,Y)}(\\{(i,j)\\}) = P_{X}(\\{i\\})P_{Y}(\\{j\\})$
The conditional probabilities $P(X=i | Y = j)$ do not dependent on $y$ and
\n
$P(X=i | Y = j) = P_X(\\{i\\}),$
\n
in other words the marginal and conditional are the same (for all values of $Y$ ).

When $X$ and $Y$ are real random variables the results hold for densities (when they exist):

$f_{(X,Y)}((x,y)) = f_X(x)f_Y(y)\\quad \\text{ and }\\quad f_Y(x)=f(x|Y=y)$

Proofs of 1) and 2)

\n $1$ ) Call $Z$ the joint variable $Z=(X,Y)$ . We have

$Z^{-1}(\\{(i,j)\\})=X^{-1}(\\{i\\})\\cap Y^{-1}(\\{j\\}).$

Hence,

$\\begin{aligned}\nP_{(X,Y)}(\\{(i,j)\\})&=& P( X^{-1}(\\{i\\})\\cap Y^{-1}(\\{j\\}))\\\\\n&=& P(X^{-1}(\\{i\\})P(Y^{-1}(\\{j\\}))\\\\\n&=& P_X(\\{i\\})P_Y(\\{j\\})\\\\\\end{aligned}$

$2$ ) When $P_Y(\\{i\\})>0$ , $\\begin{aligned}\nP(X=i | Y = j)&=&\\frac{P_{(X,Y)}(\\{i,j\\})}{P_Y(\\{j\\}) }\\\\\n&=&\\frac{P_X(\\{i\\})P_Y(\\{i\\})}{P_Y(\\{j\\}) }\\\\\n&=&P_X(\\{i\\}) \\\\\\end{aligned}$

Example

\nLet $\\Omega = \\{a,b\\}^2$ with

\n\n\n\n\n\n\n\n\n\n\n\n\n\n

$P(\\{(a,a)\\})=0.02$	$P(\\{(a,b)\\})=0.08$
$P(\\{(b,a)\\})=0.48$	$P(\\{(b,b)\\})=0.42$ .

Call $X_1$ and $X_2$ the coordinate random variables. Are $X_1$ and $X_2$ independent?
\n

", "parent": "subsec:Ind", "rank": "0", "html_name": "th:JLIV", "summary": "

Theorem 2 (Joint law of independent variables). $P_{(X,Y)}(A\\times B)=P_X(A)P_Y(B)$ $P(X\\in A | Y = y) = P_X(A),$

", "hasSummary": true, "hasTitle": true, "title": "Joint law of independent variables"}, "classes": "l0", "position": {"x": -7044.081789564841, "y": 288.5955547839242}}, {"group": "nodes", "data": {"id": "def:IndEv", "name": "definition", "text": "

Definition 13 (Independent events).
\n
\n

Motivation

\nConsider again $2$ dice rolls with a uniform probability distribution $P$ . Consider the two events

\u2019 $A$ \u2019: $\\{6\\}\\times \\{1,..,6\\}$ , the first roll is $6$ . $P(A)=\\frac{1}{6}$
\u2019 $B$ \u2019: $\\{1,..,6\\}\\times \\{6\\}$ , the second roll is $6$ . $P(B)=\\frac{1}{6}$

Remember that conditional probabilities are given by

$P(B|A) = \\frac{P(A\\cap B)}{P(A)} = \\frac{P(\\{(6,6)\\})}{\\{6\\}\\times \\{1,..,6\\}}=\\frac{1}{6}$

Hence $P(B)=P(B|A)$ : the probability of getting $6$ on the second roll is the same as the probability of getting $6$ on the second roll knowing that the first roll was a $6$ . We say in that case that the event $B$ does not depend on the event $A$ . Reversing the conditioning in the formula shows that $A$ is also independent of $B$ .

Definition

\nTwo events $A$ and $B$ are called independent when

$P(A\\cap B) =P(A)P(B)$

Interpretation

\nWhen $P(A)>0$ : the proportion of $B\\cap A$ in $A$ is the same as the proportion of $B$ in $\\Omega$

$\\frac{P(B\\cap A)}{P(A)} = \\frac{P(B)}{P(\\Omega)}=P(B).$

In other words, conditioning probabilities to the event $A$ does not change the probability of $B$ .
\n

When $P(B)>0$ , the same can be said about the proportion of $A\\cap B$ in $B$ .

Example

\nLet $\\Omega$ be the $52$ card deck with uniform distribution $P$ . Show that the events \u2019the card is a diamond\u2019 and \u2019the card is an ace\u2019 are independent.

$P(\\{ (ace,diamond) \\}) = \\frac{1}{52}$
$P( \\{ ace \\}\\times \\{hearts, diamonds, clubs, spades \\} ) = \\frac{4}{52} =\\frac{1}{13}$
$P( \\{1,..., ace\\}\\times \\{ diamonds\\} ) = \\frac{13}{52} =\\frac{1}{4}$

and

$\\frac{1}{4} \\times \\frac{1}{13} = \\frac{1}{52},$

they are independent.

", "parent": "subsec:Ind", "rank": "0", "html_name": "def:IndEv", "summary": "

Definition 13 (Independent events). Two events $A$ and $B$ are called independents when $P(A\\cap B) = P(A)P(B).$

", "hasSummary": true, "hasTitle": true, "title": "Independent events"}, "classes": "l0", "position": {"x": -7096.069015477114, "y": 1542.1955190811266}}, {"group": "nodes", "data": {"id": "def:IndRV", "name": "definition", "text": "

Definition 14 (Independent random variables).
\n
\n

Motivation

\nThe notion of independence of two random variables $X$ and $Y$ is based on the notion of independence of events.
\n

Consider two consecutive dice rolls. Call $X$ and $Y$ the 2 coordinates random variables on the configuration space $\\Omega =\\{1,2,3,4,5,6 \\}^2$ endowed with the uniform distribution $P$ .
\n

Check that events $X^{-1}(\\{i\\})\\subset \\Omega$ and $Y^{-1}(\\{j\\})\\subset \\Omega$ are independent for all $i$ and $j$ . Rephrased in common language: obtaining an $i$ for the first roll is independent of obtaining a $j$ for the second roll.
\n

More generally we will require the independence of all events in $\\Omega$ described by a constraint on the values of $X$ and all events described by a constraint on the values of $Y$ . In that case, an information on one of them does not affect the other.

Definition

\nTwo random variables $X:\\Omega \\rightarrow E$ and $Y:\\Omega \\rightarrow F$ are independent when

$\\forall A\\subset E,B\\subset F,\\quad X^{-1}(A) \\text{ and } Y^{-1}(B) \\text{ are independent events in } \\Omega.$

Remarks

Recall that it means that when $P(Y^{-1}(B))>0$ , conditioning the probability $P$ on $\\Omega$ to $Y^{-1}(B)$ does not change the probability of $X^{-1}(A)$ . The other direction holds when $P(X^{-1}(A))>0$ .
\n
It is clear that independence is a symmetric relation.
\n

", "parent": "subsec:Ind", "rank": "0", "html_name": "def:IndRV", "summary": "

Definition 14 (Independent random variables). Two random variables $X:\\Omega \\rightarrow E$ and $Y:\\Omega \\rightarrow F$ are independent when : $\\forall A\\subset E,B\\subset F,\\quad X^{-1}(A) \\text{ and } Y^{-1}(B) \\text{ are independent events in } \\Omega.$

", "hasSummary": true, "hasTitle": true, "title": "Independent random variables"}, "classes": "l0", "position": {"x": -6798.704393552654, "y": 1020.20838151672}}, {"group": "nodes", "data": {"id": "def:iid", "name": "definition", "text": "

Definition 15 (i.i.d variables).
\n
\n

Definition

\n $N$ random variables $X_i$ defined on a configuration space $\\Omega$ are independent and identically distributed when they are independent and with same law.
\nKnowing the law $P$ of one the variables determines the law of the joint distribution. In the discrete case the joint law $\\bm P$ is given by

$\\bm P(\\{(x_1,...,x_n)\\}) = P(\\{x_1\\})P(\\{x_2\\}) .. P(\\{x_n\\}).$

If $P$ has a density $f$ , the joint density $\\bm f$ is

$\\bm f((x_1,...,x_n)) = f(x_1)f(x_2) .. f(x_n).$

Example

\nIn the modelling of the repetition of $n$ dice rolls, we have always used the uniform probability over all possible configurations. From this, we deduced the independence of the different rolls.
\nThe reasoning usually goes the other way. Let $\\Omega=\\{1,2,3,4,5,6\\}^n$ and the $X_i$ be the coordinate random variables. It is usually reasonable to assume that the $X_i$ are independent, and that their law is uniform on $\\{1,2,3,4,5,6\\}$ . They are hence i.i.d. variables. This determines the probability on $\\Omega$ : $\\begin{aligned}\nP(\\{\\omega=(\\omega_1,...,\\omega_n)\\}) &=& P_{(X_1,...,X_n)}(\\{\\omega=(\\omega_1,...,\\omega_n)\\})\\\\\n&=& \\prod P_{X_i}(\\{\\omega_i \\}) = \\frac{1}{6^n}.\\\\\\end{aligned}$

Hence $P$ is uniform.

", "parent": "subsec:Ind", "rank": "0", "html_name": "def:iid", "summary": "

Definition 15 (i.i.d variables). $N$ random variables $X_i$ defined on a configuration space $\\Omega$ are independent and identically distributed when they are independent and with same law.
\n

", "hasSummary": true, "hasTitle": true, "title": "i.i.d variables"}, "classes": "l0", "position": {"x": -7011.649719946043, "y": -244.86656227072086}}, {"group": "nodes", "data": {"id": "ex:Bin", "name": "exercise", "text": "

Exercise 14 (Binomial).
\n
\nLet $X_1,...,X_n$ be independent Bernoulli variables of parameter $p$ . A Binomial variable is a sum of independent Bernoulli variables of same parameter. The law of a Binomial variable is noted $B(p,n)$ , where $p$ is the parameter of the Bernoulli, and $n$ is the number of Bernoulli variables in the sum.

Explain why the $X_1,...,X_n$ are i.i.d. variables
Make a parallel with Exercise\u00a06. Describe a configuration space $\\Omega$ and the probability $P$ on $\\Omega$ compatible with the current exercise.
Let $S_2 = X_1+X_2$ . Given the law of $S_2$
Let $S_3 = X_1+X_2+X_3$ . Given the law of $S_3$
Let $S_n=\\sum_{i=1}^{n} X_1+...+X_n$ , give the law of $S_n$ .

", "parent": "subsec:Ind", "rank": "0", "html_name": "ex:Bin", "summary": "

Exercise 14 (Binomial).

", "hasSummary": false, "hasTitle": true, "title": "Binomial"}, "classes": "l0", "position": {"x": -7024.526699601474, "y": -661.9589284740644}}, {"group": "nodes", "data": {"id": "def:IndExp", "name": "theorem", "text": "

Theorem 3 (Independence and expectation).
\n
\n

Statement

\nWhen $X$ and $Y$ are two independent variables, we have that $\\mathbb{E}(XY) = \\mathbb{E}(X)\\mathbb{E}(Y).$

Proof

\nWhen the laws have densities, the result is given by the following computation: $\\begin{aligned}\n\\mathbb{E}(XY) &=& \\int \\int xy f_{(X,Y)}dxdy \\\\\n&=& \\int \\int xy f_X(x)f_Y(y)dxdy\\\\\n&=& \\int xf_X(x)dx \\int yf_Y(y)dy\\end{aligned}$

", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:IndExp", "summary": "

Theorem 3 (Independence and expectation). When $X$ and $Y$ are two independent variables, we have that $\\mathbb{E}(XY) = \\mathbb{E}(X)\\mathbb{E}(Y).$

", "hasSummary": true, "hasTitle": true, "title": "Independence and expectation"}, "classes": "l0", "position": {"x": -1284.4874785919005, "y": 1204.6149245783558}}, {"group": "nodes", "data": {"id": "th:IndCov", "name": "theorem", "text": "

Theorem 4 (Independence and covariance).
\n
\nLet $X$ and $Y$ be independent variables. It can be checked that $X_c$ and $Y_c$ are also independent. We have

$X \\text{ and } Y \\text{ are independent } \\Rightarrow cov(X,Y)=\\mathbb{E}(X_cY_c)=\\mathbb{E}(X_c)\\mathbb{E}(Y_c)=0.$

Hence, independence and covariance are related notions. If two variables $X$ and $Y$ have non null covariance, knowing one gives information on the other. Note however that the converse is not true.

", "parent": "subsec:MomRV", "rank": "0", "html_name": "th:IndCov", "summary": "

Theorem 4 (Independence and covariance). $X \\text{ and } Y \\text{ are independent } \\Rightarrow cov(X,Y)=0.$

", "hasSummary": true, "hasTitle": true, "title": "Independence and covariance"}, "classes": "l0", "position": {"x": -1275.363933513836, "y": 708.3709925139169}}, {"group": "nodes", "data": {"id": "def:StdProp", "name": "theorem", "text": "

Theorem 5 (Properties of variance).
\n
\n

Properties

$v(aX+b)=a^2v(X)$
when $X$ and $Y$ are independent, $v(X+Y)=v(X)+v(Y)$ .

Proofs

\n $1)$
\n $v(X+b)= \\|X+b-\\mathbb{E}(X+b)\\|^2=\\|X+b-\\mathbb{E}(X)-b)\\|^2=v(X)$
\n $v(aX)= \\|aX-\\mathbb{E}(aX)\\|^2=\\|a^2X-a^2\\mathbb{E}(X))^2\\|=a^2\\|X-\\mathbb{E}(X))^2\\|=a^2v(X)$

$2)$
\n $\\begin{aligned}\nv(X+Y) &=& \\langle (X+Y)-\\mathbb{E}(X+Y),(X+Y)-\\mathbb{E}(X+Y) \\rangle \\\\\n&=& \\langle X_c+Y_c,X_c+Y_c \\rangle \\\\\n&=& \\|X_c\\|^2+\\|Y_c\\|^2+2\\langle X_c,Y_c\\rangle\\\\\n&=& v(X)+v(Y)\\end{aligned}$ This is simply the Pythagorean theorem. Alternatively we can write,

$\\begin{aligned}\nv(X+Y) &=& \\mathbb{E}( (X+Y - \\mathbb{E}(X+Y))^2) \\\\\n&=& \\mathbb{E}( (X+Y - \\mathbb{E}(X)+\\mathbb{E}Y))^2) \\\\\n&=& \\mathbb{E}( (X-\\mathbb{E}(X))^2+(Y- \\mathbb{E}(Y))^2 + 2((X-\\mathbb{E}(X))(Y- \\mathbb{E}(Y))) \\\\\n&=&v(X)+v(Y)+2\\mathbb{E}(XY-X\\mathbb{E}(Y)-\\mathbb{E}(X)Y+\\mathbb{E}(X)\\mathbb{E}(Y))\\\\\n&=&v(X)+v(Y)+2(\\mathbb{E}(X)\\mathbb{E}(Y)-\\mathbb{E}(X)\\mathbb{E}(Y)-\\mathbb{E}(X)\\mathbb{E}(Y)+\\mathbb{E}(X)\\mathbb{E}(Y))\\end{aligned}$

Important consequence

\nHence, when $X_1,...,X_n$ are independent variables with identical distributions, $v\\left(\\frac{1}{n}\\sum_i X_i\\right)=\\frac{1}{n^2}v\\left(\\sum_i X_i\\right) = \\frac{1}{n^2}\\sum_i v(X_i)=\\frac{1}{n}v(X_1)\\xrightarrow[n\\rightarrow \\infty]{} 0$

This is a very important result: summing independent and identically distributed (i.i.d) variables reduces the variance in $\\frac{1}{n}$ and the standard deviation in $\\frac{1}{\\sqrt{n}}$ .

", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:StdProp", "summary": "

Theorem 5 (Properties of variance). $v(X)=\\mathbb{E}(X^2)-\\mathbb{E}(X)^2$ $v(aX+b)=a^2v(X)$ when $X$ and $Y$ are independent, $v(X+Y)=v(X)+v(Y).$

", "hasSummary": true, "hasTitle": true, "title": "Properties of variance"}, "classes": "l0", "position": {"x": -1264.554042025985, "y": 179.87513751583037}}, {"group": "nodes", "data": {"id": "def:Expe", "name": "definition", "text": "

Definition 16 (Expectation).
\n
\n

Idea

\nLet $\\Omega=\\{1,2\\}$ with $P(\\{1\\})=\\frac{1}{10}$ and $P(\\{2\\})=\\frac{9}{10}$ . Let $X$ be a real random variable, $X:\\Omega \\rightarrow \\mathbb{R}$ . For each elementary configuration $\\omega$ , $X$ takes the value $X(\\omega)$ . The \u2019expectation\u2019 of $X$ is the \u2019average\u2019 value of the $X(\\omega)$ with respect to the probabilities of the $\\{\\omega\\}$ . In our example:

$\\mathbb{E}(X) = X(1).P(\\{1\\}) + X(2).P(\\{2\\}) = 1\\times \\frac{1}{10} + 2\\times \\frac{9}{10}.$

Definition

\nIf $\\Omega=\\mathbb{Z}$ , the average of a random variable $X:\\Omega \\rightarrow \\mathbb{R}$ becomes:

$\\mathbb{E}(X) = \\sum_{i\\in\\Omega} X(i)\\times P(\\{i\\}).$

Now consider the case $\\Omega=\\mathbb{R}$ , with a probability $P$ of density $f$ . The formula for the expectation becomes (when the integral exists)

$\\mathbb{E}(X) = \\int_{-\\infty}^{+\\infty} X(\\omega) f(\\omega)d\\omega.$

There is a definition of the expectation which does not depend on the nature of $\\Omega$ and $P$ . The expectation (average / mean) of $X$ is defined by

$\\mathbb{E}(X) = \\int_{\\Omega} XdP.$

Where the integral is a \u2019Lebesgue integrals\u2019 (as opposed to Riemann integrals). When $\\Omega$ is discrete, the Lebesgue integral becomes a sum over $\\Omega$ , while when $P$ has a density, $dP$ can be replaced by $f(\\omega)d\\omega$ . Lebesgue integrals will not be used in this course, but it is useful to be familiar with this notation.
\n

$\\mathbb{E}(X)$ from the law $P_X$

\nIn the discrete case, the previous definition is based on the sum of the $X(\\omega)$ over all the configurations $\\omega \\in \\Omega$ , weight by the probabilities $P(\\{\\omega\\})$ . The same result can be obtained by summing all the possible values $x\\in E$ that $X(\\omega)$ can take, weighted by their probabilities $P_X(\\{x\\})$ .
\nThis approach leads to the following important equalities

If $E=\\mathbb{Z}$ , we have that
\n
$\\mathbb{E}(X) = \\sum_{i\\in \\mathbb{Z}} i.P_X(\\{i\\}).$
If $E=\\mathbb{R}$ and if the law $P_X$ has a density $f_X$ , we have
\n
$\\mathbb{E}(X)= \\int_{-\\infty}^{+\\infty} xf_X(x)dx$

Exercise: Prove the result when $X:\\mathbb{Z}\\rightarrow \\mathbb{Z}$ .

Linearity

\nThe set of random variables $X:\\Omega \\rightarrow \\mathbb{R}$ is a vector space ( $(X+Y)(\\omega)=X(\\omega)+Y(\\omega)$ ). The set of random variables such that $\\mathbb{E}(X)=\\int_{\\Omega} XdP$ exists is again a vector space. Since integrals are linear, the expectation $\\mathbb{E}:L^1(\\Omega)\\rightarrow \\mathbb{R}$

$X \\mapsto \\mathbb{E}(X)=\\int_{\\Omega} XdP$

is a linear application valued in $\\mathbb{R}$ (i.e. a linear form). In other words, we have the fundamentals properties:

$\\mathbb{E}(\\alpha X) = \\alpha \\mathbb{E}(X) \\quad \\text{ and } \\quad \\mathbb{E}(X+Y) = \\mathbb{E}(X) + \\mathbb{E}(Y).$

", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:Expe", "summary": "

Definition 16 (Expectation). The general definition of the expectation of a random variable $X:\\Omega\\rightarrow \\mathbb{R}$ is $\\mathbb{E}(X) = \\int_{\\Omega} X(\\omega)dP(\\omega).$ in particular, when $\\Omega = \\{1,..,N\\}$ , $\\mathbb{E}(X) = \\sum_{i\\in\\Omega} X(i)\\times P(\\{i\\})=\\sum_{i\\in \\mathbb{Z}} i.P_X(\\{i\\}),$ and when $\\Omega = \\mathbb{R}$ , $\\mathbb{E}(X) = \\int_{-\\infty}^{+\\infty} X(\\omega) f(\\omega)d\\omega=\\int_{-\\infty}^{+\\infty} xf_X(x)dx.$

", "hasSummary": true, "hasTitle": true, "title": "Expectation"}, "classes": "l0", "position": {"x": -1256.2681048153534, "y": 2246.8829423217985}}, {"group": "nodes", "data": {"id": "def:ScalRV", "name": "definition", "text": "

Definition 17 (Inner products on random variables).
\n
\nThe expectation enables to define an important inner product on random variables. It provides a norm and a notion of angles between random variables. Let $X:\\Omega \\rightarrow \\mathbb{R}$ and $Y:\\Omega \\rightarrow \\mathbb{R}$ be $2$ random variables. When it exists, we can define

$\\langle X,Y \\rangle = \\mathbb{E}(XY),$

where $XY$ is understood as the function $\\omega \\mapsto X(\\omega)Y(\\omega)$ .
\n

Why is it an inner product ?

It is easy to check that the set of random variables $X$ such that $\\mathbb{E}(X^2)=\\int_{\\Omega} X^2dP$ exists is a vector space.
$\\mathbb{E}(XY)$ is linear in $X$ and $Y$ and $\\mathbb{E}(XY)=\\mathbb{E}(YX)$ .
Except pathological cases that we will not consider here, we can show that $x\\neq 0 \\Rightarrow \\mathbb{E}(XX)>0$

Hence $\\langle X,Y \\rangle$ is an inner product.

", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:ScalRV", "summary": "

Definition 17 (Inner products on random variables). $\\langle X,Y \\rangle = \\mathbb{E}(XY)= \\int_{\\Omega} X(\\omega)Y(\\omega)dP(\\Omega),$

", "hasSummary": true, "hasTitle": true, "title": "Inner products on random variables"}, "classes": "l0", "position": {"x": 8.002538748805819, "y": 1434.73204758651}}, {"group": "nodes", "data": {"id": "def:Cov", "name": "definition", "text": "

Definition 18 (Covariance).
\n
\nGiven a random variable $X$ note $X_{c} = X-\\mathbb{E}(X)$ the \u2019centered\u2019 variable. The covariance between two variables $X$ and $Y$ is the inner product between their centered versions: the covariance between $X:\\Omega\\rightarrow \\mathbb{R}$ and $Y:\\Omega\\rightarrow \\mathbb{R}$ is defined by

$cov(X,Y)=\\langle X_{c},Y_{c}\\rangle = \\mathbb{E}\\left( (X-\\mathbb{E}(X))(Y-\\mathbb{E}(Y)) \\right),$

when it exists.

", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:Cov", "summary": "

Definition 18 (Covariance). $cov(X,Y)=\\langle X_{c},Y_{c}\\rangle = \\mathbb{E}\\left( (X-\\mathbb{E}(X))(Y-\\mathbb{E}(Y)) \\right).$

", "hasSummary": true, "hasTitle": true, "title": "Covariance"}, "classes": "l0", "position": {"x": -117.1408484185086, "y": 765.9280894870162}}, {"group": "nodes", "data": {"id": "def:Std", "name": "definition", "text": "

Definition 19 (Variance /Standard deviation).
\n
\n

Definition

\nThe variance and standard deviation measure how a random variable varies around its mean. The deviation from the mean is given by the Euclidean norm of the centered variable $X_{c} = X-\\mathbb{E}(X)$ . When it exists, the variance is defined by,

$v(X)= \\mathbb{E}\\left((X-\\mathbb{E}(X))^2\\right)=cov(X,X)=\\|X_c\\|^2,$

and the standard deviation by

$\\sigma(X)=\\sqrt{v(X)} =\\|X_c\\|.$

Alternative formula

\nWe have the important equality $v(X)=\\mathbb{E}(X^2)-\\mathbb{E}(X)^2.$

Proof: $\\begin{aligned}\nv(X)&=&\\mathbb{E}\\left((X-\\mathbb{E}(X))^2\\right)\\\\\n&=&\\mathbb{E}\\left(X^2-2\\mathbb{E}(X)X+\\mathbb{E}(X)^2)\\right)\\\\\n&=&\\mathbb{E}(X^2)-2\\mathbb{E}(\\mathbb{E}(X)X)+\\mathbb{E}(\\mathbb{E}(X)^2)\\\\\n&=&\\mathbb{E}(X^2)-2\\mathbb{E}(X)\\mathbb{E}(X)+\\mathbb{E}(\\mathbb{E}(X)^2)\\\\\n&=&\\mathbb{E}(X^2)-\\mathbb{E}(X)^2 \\\\\\end{aligned}$

", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:Std", "summary": "

Definition 19 (Variance /Standard deviation). $v(X)= \\mathbb{E}\\left((X-\\mathbb{E}(X))^2\\right),$ $\\sigma(X)=\\sqrt{v(X)} = \\sqrt{ \\mathbb{E}\\left((X-\\mathbb{E}(X))^2\\right)}.$

", "hasSummary": true, "hasTitle": true, "title": "Variance /Standard deviation"}, "classes": "l0", "position": {"x": -76.99749540434607, "y": 216.19300869132803}}, {"group": "nodes", "data": {"id": "def:CovM", "name": "definition", "text": "

Definition 20 (Covariance matrix).
\n
\n

Definition

\nConsider $n$ random variables $X_i:\\Omega \\rightarrow \\mathbb{R}$ . The variables can be put in a column vector $\\bm X=(X_1,..,X_n)^T$ . $\\bm X$ is then a random variable $\\bm X:\\Omega \\rightarrow \\mathbb{R}^n$ . Such random variables are often called random vectors.
\nFor a random column vector $\\bm X:\\Omega \\rightarrow \\mathbb{R}^n$ , when it exists the covariance matrix is defined by $cov(\\bm X)=\\mathbb{E}(\\bm X_{c} \\bm X_{c}^T)= \\mathbb{E}\\left((\\bm X-\\mathbb{E}(\\bm X))(\\bm X-\\mathbb{E}(\\bm X))^T\\right).$

When $\\bm X$ is a line vector, the definition becomes $cov(\\bm X)=\\mathbb{E}(\\bm X_{c}^T\\bm X_{c}).$

Entries of the matrix $cov(\\bm X)$

\n $\\mathbb{E}\\left( \\begin{pmatrix} X_1-\\mathbb{E}(X_1)\\\\ X_2 -\\mathbb{E}(X_2)\\\\ . \\\\ . \\\\ X_n-\\mathbb{E}(X_n) \\end{pmatrix}\\begin{pmatrix} X_1-\\mathbb{E}(X_1)& X_2 -\\mathbb{E}(X_2)& . & . & X_n-\\mathbb{E}(X_n) \\end{pmatrix} \\right)$ $=$ $\\begin{pmatrix} \n\\mathbb{E}\\left( ( X_1-\\mathbb{E}(X_1) )( X_1-\\mathbb{E}(X_1) )\n\\right) &.&.& \\mathbb{E}\\left( ( X_1-\\mathbb{E}(X_1) )( X_n-\\mathbb{E}(X_n) ) \\right) \\\\\n.&&&.\\\\\n.&&&.\\\\\n\\mathbb{E}\\left( ( X_n-\\mathbb{E}(X_n) )( X_1-\\mathbb{E}(X_1) )\n\\right) &.&.& \\mathbb{E}\\left( ( X_n-\\mathbb{E}(X_n) )( X_n-\\mathbb{E}(X_n) ) \\right) \\\\\n\\end{pmatrix}$

Hence we can see that $cov(X)_{ij}=cov(X_{i},X_{j})=\\langle X_{i,c},X_{j,c} \\rangle$ .
\n

", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:CovM", "summary": "

Definition 20 (Covariance matrix). Consider $n$ random variables $X_i:\\Omega \\rightarrow \\mathbb{R}$ . The variables can be put in a column vector $\\bm X=(X_1,..,X_n)$

", "hasSummary": true, "hasTitle": true, "title": "Covariance matrix"}, "classes": "l0", "position": {"x": 1120.0630833685873, "y": 223.501005840664}}, {"group": "nodes", "data": {"id": "ex:Bern", "name": "exercise", "text": "

Exercise 15 (Bernoulli).
\n
\nA Bernoulli random variable $X$ is a random variable valued in $\\{0,1\\}$ . The law $P_X$ is determined by $P_X(\\{ 1 \\})$ , noted $p$ .

For a parameter $p$ , give $P_X(\\{ 0 \\})$ .
Give $\\mathbb{E}(X)$

", "parent": "subsec:MomRV", "rank": "0", "html_name": "ex:Bern", "summary": "

Exercise 15 (Bernoulli).

", "hasSummary": false, "hasTitle": true, "title": "Bernoulli"}, "classes": "l0", "position": {"x": -82.9913469130289, "y": 2489.8505672414767}}, {"group": "nodes", "data": {"id": "ex:ExpSum", "name": "exercise", "text": "

Exercise 16 (Expected sum).
\n
\nConsider $3$ rolls of a fair dice. What is the configuration space and the probability describing the rolls? Call $S$ the random variable \u2019sum of the $3$ rolls\u2019. Compute $\\mathbb{E}(S)$ .

", "parent": "subsec:MomRV", "rank": "0", "html_name": "ex:ExpSum", "summary": "

Exercise 16 (Expected sum).

", "hasSummary": false, "hasTitle": true, "title": "Expected sum"}, "classes": "l0", "position": {"x": 1094.3155288304629, "y": 2507.4109660435925}}, {"group": "nodes", "data": {"id": "ex:GaussMean", "name": "exercise", "text": "

Exercise 17 (Mean of a Gaussian RV).
\n
\n

Assume that the law of $X$ has a density $f_{\\mu,\\sigma}(x)=\\frac{1}{\\sqrt{2\\pi \\sigma^2}}e^{\\frac{-(x-\\mu)^2}{2\\sigma^2}}.$ Compute $\\mathbb{E}(X)$ .
Assume that the law of $X$ has a density $f(x,y)=\\frac{1}{2\\pi \\sigma^2 }e^{-\\frac{(x-x_0)^2+(y-y_0)^2}{2\\sigma^2}}.$
\n
Compute $\\mathbb{E}(X)$ .

", "parent": "subsec:MomRV", "rank": "0", "html_name": "ex:GaussMean", "summary": "

Exercise 17 (Mean of a Gaussian RV).

", "hasSummary": false, "hasTitle": true, "title": "Mean of a Gaussian RV"}, "classes": "l0", "position": {"x": -96.57582900774878, "y": 2114.7793235130284}}, {"group": "nodes", "data": {"id": "ex:GaussVar", "name": "exercise", "text": "

Exercise 18 (Variance on a uni-dimensional Gaussian).
\n
\nLet $\\Omega = \\mathbb{R}$ and let $P$ be a probability with Gaussian density

$f_{\\sigma}(x)=\\frac{1}{\\sqrt{2\\pi \\sigma^2}}e^{\\frac{-x^2}{2\\sigma^2}}.$

Knowing that

$\\int_{-\\infty}^{+\\infty} \\frac{1}{\\sqrt{2\\pi}}x^2e^{-\\frac{x^2}{2}}dx=1,$

Compute the variance of the probability distribution.
\n

", "parent": "subsec:MomRV", "rank": "0", "html_name": "ex:GaussVar", "summary": "

Exercise 18 (Variance on a uni-dimensional Gaussian).

", "hasSummary": false, "hasTitle": true, "title": "Variance on a uni-dimensional Gaussian"}, "classes": "l0", "position": {"x": 2250.5009496629027, "y": 731.5549544819082}}, {"group": "nodes", "data": {"id": "ex:GaussCov", "name": "exercise", "text": "

Exercise 19 (Covariance of a bi-dimensional Gaussian).
\n
\nLet $X:\\Omega \\rightarrow \\mathbb{R}^2$ be a random variable whose law has the following density

$f_{\\sigma_1,\\sigma_2}(x)=\\frac{1}{2\\pi \\sigma_1\\sigma_2 }e^{-\\frac{1}{2}\\left(\\frac{x_1^2}{\\sigma_1^2}+\\frac{x_2^2}{\\sigma_2^2}\\right)},$

where $x=\\begin{pmatrix} x_1\\\\x_2 \\end{pmatrix}$ .

Draw approximate contours of the density for different values of $\\sigma_x$ and $\\sigma_y$ .
Compute the covariance matrix $D$ of the probability distribution.
Express $\\left(\\frac{x_1^2}{\\sigma_1^2}+\\frac{x_2^2}{\\sigma_2^2}\\right)$ in terms of matrix multiplications. Rewrite the density accordingly.

Let $R=\\begin{pmatrix} cos(\\theta)&-sin(\\theta)\\\\sin(\\theta)&cos(\\theta) \\end{pmatrix}$ be a rotation matrix, and let $Y=RX$ be another random vector.

What is the covariance matrix of $Y$ ? Use the fact that for a matrix $A$ , $\\mathbb{E}(AX)=A\\mathbb{E}(X)$ .
What is the density of $Y$ ?

Express the general relation between the covariance of a Gaussian density and the term in the exponential.

", "parent": "subsec:MomRV", "rank": "0", "html_name": "ex:GaussCov", "summary": "

Exercise 19 (Covariance of a bi-dimensional Gaussian).

", "hasSummary": false, "hasTitle": true, "title": "Covariance of a bi-dimensional Gaussian"}, "classes": "l0", "position": {"x": 2243.324799893895, "y": 204.58760177480008}}, {"group": "nodes", "data": {"id": "def:WLLN", "name": "theorem", "text": "

Theorem 6 ((Weak) Law of large numbers).
\n
\n

Statement of the theorem

\nWe are now ready state an important result of the theory of probabilities, relating empirical means and expectations.
\n

Let $X_1,...,X_n,...$ be an infinite sequence i.i.d real random variables of mean $\\mu$ . Let $\\bar X_n$ be the empirical mean $\\bar X_n = \\frac{X_1+..+X_n}{n}.$ For all $\\epsilon>0$ , we have $P(|\\bar X_n-\\mu |>\\epsilon)=P(\\{\\omega \\in \\Omega |\\quad |\\bar X_n(\\omega) - \\mu |>\\epsilon\\})=\\xrightarrow[n\\rightarrow \\infty]{} 0.$

In simple words: when $n$ is large the values of $\\bar X_n$ are almost always close to $\\mu$ .
\n

Idea of the proof

\nWe will not prove this result, but it can be intuitively understood in a simple way when the variable have variances. First, note that $\\mathbb{E}(\\bar X)=\\mathbb{E}(X_1)=\\mu$ . Then, remember that $v\\left(\\bar X_n=\\frac{1}{n}\\sum_i X_i\\right)\\xrightarrow[n\\rightarrow \\infty]{} 0.$ Hence the law of the empirical mean $\\bar X_n$ is more and more concentrated around its expectation $\\mu$ , which means that the probability $P(|\\bar X_n-\\mu |>\\epsilon)$ should be smaller and smaller.

", "parent": "subsec:LLN", "rank": "0", "html_name": "def:WLLN", "summary": "

Theorem 6 ((Weak) Law of large numbers). Let $X_1,...,X_n,...$ be i.i.d. $\\bar X_n = \\frac{X_1+..+X_n}{n}.$ For all $\\epsilon>0$ , we have $P(|\\bar X_n-\\mu |>\\epsilon)\\xrightarrow[n\\rightarrow \\infty]{} 0.$

", "hasSummary": true, "hasTitle": true, "title": "(Weak) Law of large numbers"}, "classes": "l0", "position": {"x": -3002.3955704605337, "y": -886.4056204438808}}, {"group": "nodes", "data": {"id": "def:EmpM", "name": "definition", "text": "

Definition 21 (Empirical mean).
\n
\n

Definition

\nIn general, the adjective \u2019empirical\u2019 is understood as \u2019coming from observations\u2019, as opposed to a computation made on the configuration space $\\Omega$ . Assume that performing a certain experiment lead to the observation of $n$ numbers $x_i$ , modeled by random variables $X_i$ . The mean of a particular observation is $\\frac{x_1+...+x_n}{n}.$

which is described by the random variable

$\\bar X_n=\\frac{X_1+...+X_n}{n}.$

$\\bar X_n$ is called the \u2019empirical mean\u2019.
\n

By linearity, the expectation of $\\bar X_n$ is given by $\\mathbb{E}(\\bar X_n)=\\frac{\\mathbb{E}(X_1)+...+\\mathbb{E}(X_n)}{n}.$

Link with expectation

\nThe notion of empirical mean has connections but is different from the notion of expectation, which is a mean over configurations from $\\Omega$ . Remember that when $\\Omega$ has a uniform probability over a finite numbers of element, $\\mathbb{E}(X) = \\frac{\\sum_{i=1}^{|\\Omega|}X(\\omega_i)}{|\\Omega|}.$

Hence,

the empirical mean is its self a random variable, computed as a sum over different random variables.
the expectation is a number, computed for a single random variable, as a sum over configurations $\\omega\\in \\Omega$ .

", "parent": "subsec:LLN", "rank": "0", "html_name": "def:EmpM", "summary": "

Definition 21 (Empirical mean). Let $X_1,...,X_n$ be RV. Their \u2019empirical mean\u2019 is following RV, $\\bar X_n=\\frac{X_1+...+X_n}{n}$

", "hasSummary": true, "hasTitle": true, "title": "Empirical mean"}, "classes": "l0", "position": {"x": -3008.780282009238, "y": -210.97656654692923}}, {"group": "nodes", "data": {"id": "sec:Prob", "name": "section", "text": "", "parent": "", "rank": "0", "html_name": "sec:Prob", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3885.676967972946, "y": 2710.2816977060456}}, {"group": "nodes", "data": {"id": "titlesec:Prob", "name": "sectionTitle", "text": "

Probabilities

", "parent": "sec:Prob", "rank": "0", "html_name": "sec:Prob", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3942.777393921063, "y": 6409.4690158559715}}, {"group": "nodes", "data": {"id": "subsec:ConfProb", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:ConfProb", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -7749.757904204901, "y": 5121.845864314333}}, {"group": "nodes", "data": {"id": "titlesubsec:ConfProb", "name": "subsectionTitle", "text": "

Configurations and probabilities

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "subsec:ConfProb", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -7814.212974665837, "y": 6300.700516884763}}, {"group": "nodes", "data": {"id": "subsec:RV", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:RV", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3957.1621992937507, "y": 4274.1448945349875}}, {"group": "nodes", "data": {"id": "titlesubsec:RV", "name": "subsectionTitle", "text": "

Random Variables

", "parent": "subsec:RV", "rank": "0", "html_name": "subsec:RV", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -4063.0302596209003, "y": 4845.789366749729}}, {"group": "nodes", "data": {"id": "subsec:Marg", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:Marg", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -8798.314516230856, "y": 2897.418165008762}}, {"group": "nodes", "data": {"id": "titlesubsec:Marg", "name": "subsectionTitle", "text": "

Marginals

", "parent": "subsec:Marg", "rank": "0", "html_name": "subsec:Marg", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -8780.535452312293, "y": 3384.2740706855107}}, {"group": "nodes", "data": {"id": "subsec:Cond", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:Cond", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3696.8900592126656, "y": 1908.1243223472811}}, {"group": "nodes", "data": {"id": "titlesubsec:Cond", "name": "subsectionTitle", "text": "

Conditioning

", "parent": "subsec:Cond", "rank": "0", "html_name": "subsec:Cond", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3696.9254900614355, "y": 2476.8515140807704}}, {"group": "nodes", "data": {"id": "subsec:Ind", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:Ind", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -6905.508163704335, "y": 599.5812383016362}}, {"group": "nodes", "data": {"id": "titlesubsec:Ind", "name": "subsectionTitle", "text": "

Independence

", "parent": "subsec:Ind", "rank": "0", "html_name": "subsec:Ind", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -7131.811933856016, "y": 1877.6214050773367}}, {"group": "nodes", "data": {"id": "subsec:MomRV", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:MomRV", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": 483.0067355355011, "y": 1423.3745759249136}}, {"group": "nodes", "data": {"id": "titlesubsec:MomRV", "name": "subsectionTitle", "text": "

Moments of random variables

", "parent": "subsec:MomRV", "rank": "0", "html_name": "subsec:MomRV", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": 359.56828538700586, "y": 2809.374014333997}}, {"group": "nodes", "data": {"id": "subsec:LLN", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:LLN", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3242.911850592072, "y": -419.59974154820395}}, {"group": "nodes", "data": {"id": "titlesubsec:LLN", "name": "subsectionTitle", "text": "

Laws of large numbers

", "parent": "subsec:LLN", "rank": "0", "html_name": "subsec:LLN", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3485.42813072361, "y": 211.70613734747283}}, {"data": {"id": "rem:AbsConfsubsec:ConfProb", "source": "subsec:ConfProb", "target": "rem:AbsConf", "type": "strong", "visibility": 1}}, {"data": {"id": "rem:AbsConfsubsec:RV", "source": "subsec:RV", "target": "rem:AbsConf", "type": "strong", "visibility": 1}}, {"data": {"id": "def:ConfSprem:IntProb", "source": "rem:IntProb", "target": "def:ConfSp", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Evdef:ConfSp", "source": "def:ConfSp", "target": "def:Ev", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Probdef:Ev", "source": "def:Ev", "target": "def:Prob", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Probrem:ProbAss", "source": "rem:ProbAss", "target": "def:Prob", "type": "strong", "visibility": 1}}, {"data": {"id": "def:ProbDendef:Prob", "source": "def:Prob", "target": "def:ProbDen", "type": "strong", "visibility": 1}}, {"data": {"id": "rem:Evdef:Ev", "source": "def:Ev", "target": "rem:Ev", "type": "strong", "visibility": 1}}, {"data": {"id": "rem:ProbAssdef:Ev", "source": "def:Ev", "target": "rem:ProbAss", "type": "strong", "visibility": 1}}, {"data": {"id": "def:RVLawdef:RV", "source": "def:RV", "target": "def:RVLaw", "type": "strong", "visibility": 1}}, {"data": {"id": "def:JoinRVdef:RV", "source": "def:RV", "target": "def:JoinRV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:JoinRVdef:RVLaw", "source": "def:RVLaw", "target": "def:JoinRV", "type": "weak"}}, {"data": {"id": "def:MargProbdef:Marg", "source": "def:Marg", "target": "def:MargProb", "type": "strong", "visibility": 1}}, {"data": {"id": "def:MargRVdef:Marg", "source": "def:Marg", "target": "def:MargRV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:MargRVdef:MargProb", "source": "def:MargProb", "target": "def:MargRV", "type": "weak"}}, {"data": {"id": "def:MargRVdef:RVLaw", "source": "def:RVLaw", "target": "def:MargRV", "type": "weak"}}, {"data": {"id": "ex:MargGaussex:Gauss", "source": "ex:Gauss", "target": "ex:MargGauss", "type": "weak"}}, {"data": {"id": "th:Bayesdef:CondProb", "source": "def:CondProb", "target": "th:Bayes", "type": "strong", "visibility": 1}}, {"data": {"id": "def:CondCartrem:CondCart", "source": "rem:CondCart", "target": "def:CondCart", "type": "strong", "visibility": 1}}, {"data": {"id": "def:CondCartdef:CondProb", "source": "def:CondProb", "target": "def:CondCart", "type": "strong", "visibility": 1}}, {"data": {"id": "def:CondCartdef:MargProb", "source": "def:MargProb", "target": "def:CondCart", "type": "strong", "visibility": 1}}, {"data": {"id": "def:CondCartdef:JoinRV", "source": "def:JoinRV", "target": "def:CondCart", "type": "weak"}}, {"data": {"id": "def:CondCartdef:MargRV", "source": "def:MargRV", "target": "def:CondCart", "type": "weak"}}, {"data": {"id": "rem:CondCartdef:CondProb", "source": "def:CondProb", "target": "rem:CondCart", "type": "strong", "visibility": 1}}, {"data": {"id": "ex:DWRPMex:RemCom", "source": "ex:RemCom", "target": "ex:DWRPM", "type": "weak"}}, {"data": {"id": "th:JLIVdef:IndRV", "source": "def:IndRV", "target": "th:JLIV", "type": "strong", "visibility": 1}}, {"data": {"id": "th:JLIVdef:MargRV", "source": "def:MargRV", "target": "th:JLIV", "type": "strong", "visibility": 1}}, {"data": {"id": "th:JLIVdef:CondCart", "source": "def:CondCart", "target": "th:JLIV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:IndRVsubsec:RV", "source": "subsec:RV", "target": "def:IndRV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:IndRVdef:IndEv", "source": "def:IndEv", "target": "def:IndRV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:iidth:JLIV", "source": "th:JLIV", "target": "def:iid", "type": "strong", "visibility": 1}}, {"data": {"id": "ex:Binex:Bern", "source": "ex:Bern", "target": "ex:Bin", "type": "weak"}}, {"data": {"id": "ex:Binex:SumEq01", "source": "ex:SumEq01", "target": "ex:Bin", "type": "weak"}}, {"data": {"id": "def:IndExpdef:Expe", "source": "def:Expe", "target": "def:IndExp", "type": "strong", "visibility": 1}}, {"data": {"id": "def:IndExpdef:IndRV", "source": "def:IndRV", "target": "def:IndExp", "type": "strong", "visibility": 1}}, {"data": {"id": "th:IndCovdef:IndExp", "source": "def:IndExp", "target": "th:IndCov", "type": "strong", "visibility": 1}}, {"data": {"id": "th:IndCovdef:Cov", "source": "def:Cov", "target": "th:IndCov", "type": "strong", "visibility": 1}}, {"data": {"id": "def:StdPropdef:Std", "source": "def:Std", "target": "def:StdProp", "type": "strong", "visibility": 1}}, {"data": {"id": "def:StdPropth:IndCov", "source": "th:IndCov", "target": "def:StdProp", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Expedef:RV", "source": "def:RV", "target": "def:Expe", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Expedef:RVLaw", "source": "def:RVLaw", "target": "def:Expe", "type": "strong", "visibility": 1}}, {"data": {"id": "def:ScalRVdef:Expe", "source": "def:Expe", "target": "def:ScalRV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:ScalRVdef:IndExp", "source": "def:IndExp", "target": "def:ScalRV", "type": "weak"}}, {"data": {"id": "def:Covdef:ScalRV", "source": "def:ScalRV", "target": "def:Cov", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Stddef:Cov", "source": "def:Cov", "target": "def:Std", "type": "strong", "visibility": 1}}, {"data": {"id": "def:CovMdef:Cov", "source": "def:Cov", "target": "def:CovM", "type": "strong", "visibility": 1}}, {"data": {"id": "def:WLLNdef:iid", "source": "def:iid", "target": "def:WLLN", "type": "strong", "visibility": 1}}, {"data": {"id": "def:WLLNdef:EmpM", "source": "def:EmpM", "target": "def:WLLN", "type": "strong", "visibility": 1}}, {"data": {"id": "def:WLLNdef:StdProp", "source": "def:StdProp", "target": "def:WLLN", "type": "strong", "visibility": 1}}, {"data": {"id": "def:EmpMdef:Expe", "source": "def:Expe", "target": "def:EmpM", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:RVsubsec:ConfProb", "source": "subsec:ConfProb", "target": "subsec:RV", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:Margsubsec:ConfProb", "source": "subsec:ConfProb", "target": "subsec:Marg", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:Margdef:JoinRV", "source": "def:JoinRV", "target": "subsec:Marg", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:Margdef:RVLaw", "source": "def:RVLaw", "target": "subsec:Marg", "type": "weak"}}, {"data": {"id": "subsec:Condsubsec:ConfProb", "source": "subsec:ConfProb", "target": "subsec:Cond", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:Inddef:Prob", "source": "def:Prob", "target": "subsec:Ind", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:Inddef:CondProb", "source": "def:CondProb", "target": "subsec:Ind", "type": "weak"}}];