var graph = [{"group": "nodes", "data": {"id": "rem:AbsConf", "name": "remark", "text": "
\n

Remark 1 (Abstract configuration space).
\n
\nMost probability problems of this course start by defining a configuration space \u03a9\\Omega and a probability PP on it. Though, many real life problems involving probabilities make no mention of the space \u03a9\\Omega.

\n
\n
\n

Example: weather

\n
\n


\nAssume that we want to predict the speed of the wind SS in Marseille. Since we know that the speed of the wind is related to the temperature TT, we can attempt to predict the speed of the wind from the temperature. The speed of the wind if a vector of \u211d3\\mathbb{R}^3 and the temperature is a positive number, hence our problem can be fully modeled by the configuration space \u03a91=\u211d3\u00d7\u211d+\\Omega_1 = \\mathbb{R}^3 \\times \\mathbb{R}_+. From a database of measurements, we can estimate the probability of the elementary configuration by P1({(s,t)})=n(s,t)nP_1(\\{(s,t)\\})=\\frac{n_{(s,t)}}{n}, where n(s,t)n_{(s,t)} is the number of times that (s,t)(s,t) was measured and nn is the total number of measurements. With the law P1P_1, it is possible compute the conditional probability (which we will define later) of the wind speed given the temperature. With this conditional probability, it is possible to tell what is the most probable value of wind speed for a temperature.

\n

Now assume that temperature is a poor predictor of the speed of wind: given a temperature, the probability over the wind has a high variance. In order to improve our analysis, it is common to incorporate new parameters in our model. We could for instance consider the air pressure PrPr. The new configuration space would then be \u03a92=\u211d3\u00d7\u211d+\u00d7\u211d+\\Omega_2 = \\mathbb{R}^3 \\times \\mathbb{R}_+ \\times \\mathbb{R}_+, with a probability P2P_2. With luck, the probability on the speed of wind given a temperature and a pressure has a small variance.

\n

If not, we could incorporate more and more parameters, and consider new configuration spaces. In practice, instead of changing the configuration space each time that we want to add a parameter, we treat the parameters as random variables of a larger configuration space. At a given time, all the weather parameters are functions of the positions and speeds of particles, which could be described by the configuration space \u03a9=(\u211d3\u00d7\u211d3)N\\Omega = (\\mathbb{R}^3 \\times \\mathbb{R}^3)^N, where NN is the number of particles. Wind speed, temperature, and air pressure are functions of the type \u03a9\u2192\u211dd\\Omega \\rightarrow \\mathbb{R}^d and are hence random variables.

\n

In practice, evaluating and manipulating a probability PP on the positions and speeds of every particles is unrealistic. Though everything that matters is the joint law of the studied variables, in our case the joint law P(S,T,Pr)P_{(S,T,Pr)} of the random variable (S,T,Pr):\u03a9\u2192\u211d3\u00d7\u211d+\u00d7\u211d+(S,T,Pr):\\Omega \\rightarrow \\mathbb{R}^3 \\times \\mathbb{R}_+ \\times \\mathbb{R}_+. Technically speaking, \u211d3\u00d7\u211d+\u00d7\u211d+\\mathbb{R}^3 \\times \\mathbb{R}_+ \\times \\mathbb{R}_+ is no longer referred to as a configuration space, given that additional parameters may be included in our analysis. Note that the precise form of \u03a9=(\u211d3\u00d7\u211d3)N\\Omega = (\\mathbb{R}^3 \\times \\mathbb{R}^3)^N is not particularly helpful. It is only necessary to know that the relevant parameters are functions on \u03a9\\Omega. In this context, the term \u2019universe\u2019 to refer to \u03a9\\Omega is particularly appropriated.

\n
\n
\n

Moral

\n
\n


\nIn practice, the general configuration space \u03a9\\Omega and its probability are often not precisely described, or even mentioned. In these cases, what matters is the joint probability law of the variables of interest.

\n
", "parent": "sec:Prob", "rank": "0", "html_name": "rem:AbsConf", "summary": "
\n

Remark 1 (Abstract configuration space). In practice, the general configuration space \u03a9\\Omega and its probability are often not precisely described, or even mentioned. In these cases, what matters is the joint probability law of the variables of interest.

\n
", "hasSummary": true, "hasTitle": true, "title": "Abstract configuration space"}, "classes": "l0", "position": {"x": -5264.652624402453, "y": 5665.884246925858}}, {"group": "nodes", "data": {"id": "def:ConfSp", "name": "definition", "text": "
\n

Definition 1 (Configuration space / universe).
\n

\n
\n
\n

Definition

\n
\n


\n

\n\n

Warning : the name of the space \u03a9\\Omega and its elements \u03c9\\omega vary across contexts ans languages. \u03a9\\Omega is often called the \u2019sample space\u2019 or \u2019universe\u2019, and the elements \u03c9\\omega \u2019samples\u2019,\u2019outcomes\u2019 or \u2019realizations\u2019.
\n

\n
\n
\n

Examples (1)

\n
\n


\nProbabilistic models are very useful to analyse dice rolls and card games. What is a relevant configuration space

\n\n
\n
\n

Examples (2)

\n
\n


\nAssume that \ud835\udcae\\mathcal{S} is a 5252 card deck and that we are interested in modeling an experiment where a player draws cards from the card game. What should we chose as configuration space when

\n\n

Assume a player draws nn cards without putting them back in the deck. What is the size of the smallest configuration space describing the possible results ?

\n

Note that when using Cartesian products, we have (a,b)\u2260(b,a)(a,b)\\neq (b,a). The order between the card draws is taken into account. In card games, the order in which cards are drawn is often not important. In that case, when drawing 22 cards, we have (a,b)\u223c(b,a)(a,b)\\sim (b,a).

\n

If we do not distinguish results up to permutations, what is the size of the configuration space when

\n\n
\n
\n

Examples (3)

\n
\n


\nIn this course we are particularly interested in signal and image processing problems.

\n

What are the relevant configuration spaces when studying

\n\n
", "parent": "subsec:ConfProb", "rank": "0", "html_name": "def:ConfSp", "summary": "
\n

Definition 1 (Configuration space / universe). The configuration space is usually noted \u03a9\\Omega. As its name indicates, the set \u03a9\\Omega contains all the possible configurations of a probabilistic model. Its elements are usually noted \u03c9\\omega, and we will call them \u2019elementary configurations\u2019.

\n
", "hasSummary": true, "hasTitle": true, "title": "Configuration space / universe"}, "classes": "l0", "position": {"x": -6591.287378112764, "y": 5808.483163577717}}, {"group": "nodes", "data": {"id": "def:Ev", "name": "definition", "text": "
\n

Definition 2 (Events).
\n

\n
\n
\n

Definition

\n
\n


\nAn event AA is a subset of the configuration space \u03a9\\Omega: A\u2282\u03a9A\\subset \\Omega.
\n

\n

Remark: a more precise definition requires the notion of \u03c3\\sigma-algebra. The next \u2019Remark\u2019 node gives the definition of \u03c3\\sigma-algebras, but the notion will not be used in the rest of the course.
\n

\n
\n
\n

Examples

\n
\n


\n

\n\n
", "parent": "subsec:ConfProb", "rank": "0", "html_name": "def:Ev", "summary": "
\n

Definition 2 (Events). An event AA is a subset of the configuration space \u03a9\\Omega: A\u2282\u03a9A\\subset \\Omega.

\n
", "hasSummary": true, "hasTitle": true, "title": "Events"}, "classes": "l0", "position": {"x": -6569.997740292376, "y": 5276.242218068017}}, {"group": "nodes", "data": {"id": "def:Prob", "name": "definition", "text": "
\n

Definition 3 (Probability).
\n

\n
\n
\n

Definition

\n
\n


\nA probability PP is a function which takes in input an event and returns a positive real number. It must verify the following axioms:

\n\n

Direct consequences:

\n\n
\n
\n

Interpretation

\n
\n


\nThe probability gives a notion of \u2019size\u2019 to events. Given that the size of \u03a9\\Omega is 11, the size of events can be interpreted as the proportion they occupy in \u03a9\\Omega. Hence, the PP of probability can also be read as the PP of proportion.
\n

\n

Assume that a dice is a perfect cube. All the faces are indistinguishable from each other, apart from the number they carry. In that case, it is relevant to assign probabilities according to the indistinguishability principle:

\n

P({1})=16,P({2})=16,P({3})=16,P(\\{1\\}) = \\frac{1}{6}, P(\\{2\\}) = \\frac{1}{6},P(\\{3\\}) = \\frac{1}{6}, P({4})=16,P({5})=16,P({6})=16.P(\\{4\\}) = \\frac{1}{6},P(\\{5\\}) = \\frac{1}{6},P(\\{6\\}) = \\frac{1}{6}.

\n

The distribution is called uniform. It is now possible to compute the probability of every event, using the axiom 3 (if A\u2229B=\u2205A\\cap B=\\emptyset then P(A\u222aB)=P(A)+P(B)P(A\\cup B)=P(A)+P(B)):

\n

P({2,4,6})=P({2}\u222a{4}\u222a{6})=P({2})+P({4})+P({6})=12.P\\left(\\{2,4,6\\}\\right) = P\\left(\\{2\\}\\cup\\{4\\}\\cup \\{6\\}\\right)=P(\\{2\\})+P(\\{4\\})+P(\\{6\\})=\\frac{1}{2}.

\n

Assume now that our dice is a bit damaged. It is no longer perfectly symmetrical. Hence We cannot use the indistinguishability principle anymore. In that case a relevant approach is to perform many dice rolls and to assign probabilities according the frequency of apparition of number.

\n

P({1})=k1N,P({2})=k2N,P({3})=k3N,P(\\{1\\}) = \\frac{k_1}{N}, P(\\{2\\}) = \\frac{k_2}{N},P(\\{3\\}) = \\frac{k_3}{N}, P({4})=k4N,P({5})=k5N,P({6})=k6N.P(\\{4\\}) = \\frac{k_4}{N},P(\\{5\\}) = \\frac{k_5}{N},P(\\{6\\}) = \\frac{k_6}{N}.

\n

Exactly as before, the probability of other events is computed with the rule \u2019if A\u2229B=\u2205A\\cap B=\\emptyset then P(A\u222aB)=P(A)+P(B)P(A\\cup B)=P(A)+P(B)\u2019.

\n
", "parent": "subsec:ConfProb", "rank": "0", "html_name": "def:Prob", "summary": "
\n

Definition 3 (Probability).

\n\n
", "hasSummary": true, "hasTitle": true, "title": "Probability"}, "classes": "l0", "position": {"x": -6588.173409433914, "y": 4695.7977448282445}}, {"group": "nodes", "data": {"id": "def:ProbDen", "name": "definition", "text": "
\n

Definition 4 (Probability density).
\n
\nLet PP be a probability on \u211d\\mathbb{R}. We say that the function ff is the probability density of PP when

\n

\u2200A\u2208\u211d,P(A)=\u222bAf(x)dx.\\forall A\\in \\mathbb{R},\\quad P(A)=\\int_A f(x)\\mathrm{d}x.

\n

Note that PP does not always have a density. Take for instance PP such that P({0})=1P(\\{0\\})=1: there are no functions ff verifying the above condition.

\n
", "parent": "subsec:ConfProb", "rank": "0", "html_name": "def:ProbDen", "summary": "
\n

Definition 4 (Probability density). ff is the density of PP, a probability on \u211d\\mathbb{R}, when \u2200A\u2208\u211d,P(A)=\u222bAf(x)dx.\\forall A\\in \\mathbb{R},\\quad P(A)=\\int_A f(x)\\mathrm{d}x.

\n
", "hasSummary": true, "hasTitle": true, "title": "Probability density"}, "classes": "l0", "position": {"x": -7411.798315229266, "y": 4025.4912117439035}}, {"group": "nodes", "data": {"id": "rem:IntProb", "name": "remark", "text": "
\n

Remark 2 (Introductory remark).
\n
\nGeneral remark: in most situations the word probability refers to a proportion.
\n

\n

For instance, the question

\n

\n\"what is the probability of this event?\" \\text{\n\"what is the probability of this event?\" \n}

\n

can usually be interpreted as

\n

\n\"what is the proportion of the possible configurations in which that event occurs?\" \\text{\n\"what is the proportion of the possible configurations in which that event occurs?\" \n}

\n
", "parent": "subsec:ConfProb", "rank": "0", "html_name": "rem:IntProb", "summary": "
\n

Remark 2 (Introductory remark). General remark: in most situations the word probability refers to a proportion. For instance, the question \"what is the probability of this event?\" can usually be interpreted as \"what is the proportion of the possible configurations in which that event occurs?\".

\n
", "hasSummary": true, "hasTitle": true, "title": "Introductory remark"}, "classes": "l0", "position": {"x": -7804.796733874878, "y": 5776.548706847134}}, {"group": "nodes", "data": {"id": "rem:Ev", "name": "remark", "text": "
\n

Remark 3 (Events and \u03c3\\sigma-algebra).
\n

\n
\n
\n

Introduction

\n
\n


\nEarlier, events have been defined as subset A\u2282\u03a9A\\subset \\Omega. When \u03a9\\Omega is not a finite set, some mathematical problems can appear, and not every subset A\u2282\u03a9A\\subset \\Omega is considered to be an event. For instance when \u03a9=\u211d\\Omega = \\mathbb{R}, events are subsets of \u211d\\mathbb{R} which can be constructed \u2019easily\u2019 from intervals. For instance A=\u22c3i\u2208\u2115[i2,i+12]A= \\bigcup_{i\\in \\mathbb{N}} \\left[\\frac{i}{2},\\frac{i+1}{2}\\right] is an event. Its probability is given by the sum of P([i2,i+12])P\\left([\\frac{i}{2},\\frac{i+1}{2}]\\right). On the other hand, it is possible to define complicated set which cannot be expressed from intervals. Using the axioms of probability, we cannot compute the probability of such set from the probability of intervals. Hence we exclude them from events.
\n

\n
\n
\n

Definition

\n
\n


\nThe events are the subsets of \u03a9\\Omega on which we define probability values. The set of events is called the \u03c3\\sigma-algebra. In order to be consistent with the calculus of probabilities, the \u03c3\\sigma-algebra \ud835\udc9c\\mathcal{A} on \u03a9\\Omega is required to verify the following axioms:

\n\n
\n
\n

Examples

\n
\n


\n

\n\n
", "parent": "subsec:ConfProb", "rank": "0", "html_name": "rem:Ev", "summary": "
\n

Remark 3 (Events and \u03c3\\sigma-algebra). Strictly speaking, events are not arbitrary subsets of \u03a9\\Omega: they must belong to the \u03c3\\sigma-algebra of \u03a9\\Omega. (Beyond the scope of this course)

\n
", "hasSummary": true, "hasTitle": true, "title": "Events and \u03c3\\sigma-algebra"}, "classes": "l0", "position": {"x": -7794.151914964684, "y": 5212.373304606854}}, {"group": "nodes", "data": {"id": "rem:ProbAss", "name": "remark", "text": "
\n

Remark 4 (Assignment of probabilities).
\nTo model real life situations, we assign probabilities to events which reflect what we know about the situation. The probabilities on the configuration space \u03a9\\Omega are usually assigned according to one of the two following principles.

\n\n

Independently from how they are assigned, probability values should verify several rules.

\n
", "parent": "subsec:ConfProb", "rank": "0", "html_name": "rem:ProbAss", "summary": "
\n

Remark 4 (Assignment of probabilities). Probability values are usually assigned either based either on an indistinguishability criterion, or from observed frequencies of a repeated experiment.

\n
", "hasSummary": true, "hasTitle": true, "title": "Assignment of probabilities"}, "classes": "l0", "position": {"x": -7826.086371695265, "y": 4701.421996917542}}, {"group": "nodes", "data": {"id": "ex:RemSet", "name": "exercise", "text": "
\n

Exercise 1 (Reminders of set theory).
\n
\n

\n\n
", "parent": "subsec:ConfProb", "rank": "0", "html_name": "ex:RemSet", "summary": "
\n

Exercise 1 (Reminders of set theory).

\n
", "hasSummary": false, "hasTitle": true, "title": "Reminders of set theory"}, "classes": "l0", "position": {"x": -8929.518068117426, "y": 5857.966875248107}}, {"group": "nodes", "data": {"id": "ex:RemCom", "name": "exercise", "text": "
\n

Exercise 2 (Reminders on combinatorics).
\n
\n

\n
\n
\n

Q1: permutations

\n
\n


\nLet \ud835\udcae\\mathcal{S} be a set of nn elements. We call \u03c3:\ud835\udcae\u2192\ud835\udcae\\sigma:\\mathcal{S}\\rightarrow \\mathcal{S} a permutation when is it a bijection. How many different permutation \u03c3\\sigma can we construct?
\n

\n
\n
\n

Q2: drawing objects from a box

\n
\n


\nAssume that elements of \ud835\udcae\\mathcal{S} are nn physical objects contained in a box. Draw successively kk element from the box, without looking in the box, and without putting the element back before taking the next one. If we remember the order in which we take the elements, how many different configurations can we obtain? And if we don\u2019t remember the order in which we took the different elements?

\n
", "parent": "subsec:ConfProb", "rank": "0", "html_name": "ex:RemCom", "summary": "
\n

Exercise 2 (Reminders on combinatorics).

\n
", "hasSummary": false, "hasTitle": true, "title": "Reminders on combinatorics"}, "classes": "l0", "position": {"x": -8928.41412074155, "y": 5434.540693704344}}, {"group": "nodes", "data": {"id": "ex:Gauss", "name": "exercise", "text": "
\n

Exercise 3 (Gaussian densities (1D1D)).
\n
\n

\n\n
", "parent": "subsec:ConfProb", "rank": "0", "html_name": "ex:Gauss", "summary": "
\n

Exercise 3 (Gaussian densities (1D1D)).

\n
", "hasSummary": false, "hasTitle": true, "title": "Gaussian densities (1D1D)"}, "classes": "l0", "position": {"x": -8921.777124083159, "y": 4104.367513229578}}, {"group": "nodes", "data": {"id": "def:RV", "name": "definition", "text": "
\n

Definition 5 (Random variables).
\n

\n
\n
\n

Introduction

\n
\n


\nIntroduce first the idea behind the definition. Build a configuration space representing the possible ages of 22 persons. A natural choice is \u03a9=\u211d\u00d7\u211d\\Omega = \\mathbb{R} \\times \\mathbb{R} where the first coordinate represents the possible ages of the first person,and the second the possible age of the second person (\u03a9=\u211d+\u00d7\u211d+\\Omega = \\mathbb{R}_+ \\times \\mathbb{R}_+ is also a natural choice).
\n

\n

The coordinate function age1:\u03a9\u2192\u211dage_1:\\Omega \\rightarrow \\mathbb{R},

\n

age1(\u03c9=(\u03c91,\u03c92))=\u03c91age_1\\left(\\omega=(\\omega_1,\\omega_2)\\right)=\\omega_1

\n

\u2019extracts\u2019 the age of the first person out of an elementary configuration \u03c9\\omega. Similarly, the coordinate function age2:\u03a9\u2192\u211dage_2:\\Omega \\rightarrow \\mathbb{R},

\n

age2(\u03c9=(\u03c91,\u03c92))=\u03c92age_2\\left(\\omega=(\\omega_1,\\omega_2)\\right)=\\omega_2

\n

\u2019extracts\u2019 age of the second person. The functions age1age_1 and age2age_2 are called random variables.
\nThe coordinates functions age1age_1 and age2age_2 extracts interesting quantities from the elementary configuration. However, there are other interesting quantities which are not coordinate functions. For instance, for an elementary configuration \u03c9\\omega, we can be interested in the average age of the 22 persons. We can define the function average:\u03a9\u2192\u211daverage:\\Omega \\rightarrow \\mathbb{R},

\n

average(\u03c9=(\u03c91,\u03c92))=\u03c91+\u03c922average\\left(\\omega=(\\omega_1,\\omega_2)\\right) = \\frac{\\omega_1+\\omega_2}{2}

\n

The function averageaverage is also called a random variable.
\n

\n
\n
\n

Definition

\n
\n


\nA random variable XX taking values in a space EE is a function X:\u03a9\u2192EX:\\Omega\\rightarrow E. In particular, integer random variables are functions X:\u03a9\u2192\u2115X:\\Omega\\rightarrow \\mathbb{N} and real random variable are functions X:\u03a9\u2192\u211dX:\\Omega\\rightarrow \\mathbb{R}.
\n

\n

Remark: this is a simplified version of the true definition which requires that random variables are \u2019measurable functions\u2019. When AA is in the \u03c3\\sigma-algebra of EE, X\u22121(A)X^{-1}(A) should be in the \u03c3\\sigma-algebra of \u03a9\\Omega.
\n

\n
\n
\n

Example 1

\n
\n


\nConsider the following dice roll game. If the number is even, the player gains 11 euro, and if the number is odd he loses 11 euro. The function Gain:\u03a9={1,2,3,4,5,6}\u2192{\u22121,1}Gain:\\Omega=\\{1,2,3,4,5,6\\}\\rightarrow \\{-1,1\\}, G(1)=G(3)=G(5)=\u22121G(1)=G(3)=G(5)=-1 G(2)=G(4)=G(6)=1G(2)=G(4)=G(6)=1 is a random variable.

\n
\n
\n

Example 2: coordinate random variables

\n
\n


\nAssume that we roll a dice nn times. The configurations space that describes all the possible results is \u03a9={1,2,3,4,5,6}n\\Omega = \\{1,2,3,4,5,6\\}^n. Let XiX_i be the ii-th coordinate function on \u03a9\\Omega. For instance,

\n

X1((3,2,3,...,6))=3.X_1\\left( (3,2,3,...,6) \\right) = 3. These random variables \u2019extract\u2019 the result of the ii-th roll out of the elementary configuration. We can also construct the random variables Gaini=Gain\u2218XiGain_i = Gain \\circ X_i: the gain at the ii-th roll.

\n
", "parent": "subsec:RV", "rank": "0", "html_name": "def:RV", "summary": "
\n

Definition 5 (Random variables). A random variable XX is a function defined on the configuration space \u03a9\\Omega. A integer random variable is valued in \u2115\\mathbb{N} or \u2124\\mathbb{Z} and a real random variable is valued in \u211d\\mathbb{R}.

\n
", "hasSummary": true, "hasTitle": true, "title": "Random variables"}, "classes": "l0", "position": {"x": -3993.986179509592, "y": 3935.377851810639}}, {"group": "nodes", "data": {"id": "def:RVLaw", "name": "definition", "text": "
\n

Definition 6 (Law of a random variables).
\n

\n
\n
\n

Introduction

\n
\n


\nLet \u03a9\\Omega be a configuration space with a probability PP, and XX be a random variable taking values in a set EE. Recall that PP is a function that takes in input an event of \u03a9\\Omega and associates its probability. The random variable transports the probability PP to the set EE. The probability of a set AA in EE is determined by the probability of the inverse image of AA in \u03a9\\Omega.
\n

\n
\n
\n

Definition

\n
\n


\nThe law of PXP_X of the random variable X:\u03a9\u2192EX:\\Omega \\rightarrow E is a probability on EE defined by PX(A\u2282E)=P(X\u22121(A)).P_X(A \\subset E) = P\\left(X^{-1}(A)\\right). It is called the law of the random variable XX.
\nRemarks:

\n\n

In the previous game, the random variable Gain:\u03a9={1,2,3,4,5,5}\u2192{\u22121,1}Gain:\\Omega=\\{1,2,3,4,5,5\\}\\rightarrow \\{-1,1\\}, Gain(1)=Gain(3)=Gain(5)=\u22121Gain(1)=Gain(3)=Gain(5)=-1 Gain(2)=Gain(4)=Gain(6)=1Gain(2)=Gain(4)=Gain(6)=1 induces a probability distribution on {\u22121,1}\\{-1,1\\}. We have PGain({\u22121})=P(Gain\u22121({\u22121}))=P({1})+P({3})+P({5})P_{Gain}(\\{-1\\}) = P(Gain^{-1}(\\{-1\\}))=P(\\{1\\})+P(\\{3\\})+P(\\{5\\}) PGain({1})=P(Gain\u22121({1}))=P({2})+P({4})+P({6}).P_{Gain}(\\{1\\}) = P(Gain^{-1}(\\{1\\}))=P(\\{2\\})+P(\\{4\\})+P(\\{6\\}).

\n
\n
\n

Definition: cumulative distribution

\n
\n


\nLet XX be a real random variable (E=\u211dE=\\mathbb{R}). The probability PXP_X can be represented by its cumulative distribution FX:\u211d\u2192\u211dF_X:\\mathbb{R}\\rightarrow \\mathbb{R}, defined by

\n

FX(a)=PX(]\u2212\u221e,a]).F_X(a) = P_X\\left(]-\\infty,a]\\right).

\n

All the information on PXP_X is contained in FXF_X. Since the cumulative distribution is a function \u211d\u2192\u211d\\mathbb{R}\\rightarrow \\mathbb{R}, sometimes it is conceptually simpler to manipulate FXF_X than the probability itself which is a function from the subsets of \u211d\\mathbb{R} to \u211d\\mathbb{R}.

\n\n
\n
\n

Definition: density of a real random variable

\n
\n


\nLet XX be a real random variable. Sometimes, the law of XX admits a density. When it exists, the density of XX is a function fXf_X such that

\n

PX([a,b])=\u222babfX(x)dx.P_X([a,b]) = \\int_a^b f_X(x)dx.

\n

Remark: we defined the cumulative distribution and density of a \u2019random variable\u2019 but note that it depends on XX only through the probability PXP_X. Hence the density can be defined for a probability PP on \u211d\\mathbb{R} even if it is not defined as the law of a random variable.

\n
\n
\n

Exercises

\n
\n


\nFor a real random variable XX,

\n\n
", "parent": "subsec:RV", "rank": "0", "html_name": "def:RVLaw", "summary": "
\n

Definition 6 (Law of a random variables). A RV X:\u03a9\u2192EX:\\Omega \\rightarrow E transports the probability on \u03a9\\Omega to a probability on EE. For AA an event in EE, define PX(A)=P(X\u22121(A)).P_X(A) = P\\left(X^{-1}(A)\\right). PXP_X is called the law of the random variable XX.

\n
", "hasSummary": true, "hasTitle": true, "title": "Law of a random variables"}, "classes": "l0", "position": {"x": -2761.1231158163064, "y": 3870.8379779883944}}, {"group": "nodes", "data": {"id": "def:JoinRV", "name": "definition", "text": "
\n

Definition 7 (Joint random variables).
\n

\n
\n
\n

Definition

\n
\n


\nConsider 22 random variables X:\u03a9\u2192EX:\\Omega \\rightarrow E and Y:\u03a9\u2192FY:\\Omega \\rightarrow F. The random variable Z:\u03a9\u2192E\u00d7FZ:\\Omega \\rightarrow E\\times F,

\n

Z=(X,Y)Z=(X,Y)

\n

is called the joint random variable. The law PZ=P(X,Y)P_{Z}=P_{(X,Y)} of ZZ is called the joint law (or joint probability distribution).
\n

\n
\n
\n

Examples

\n
\n


\nConsider the case of nn dice rolls. Let \u03a9={1,2,3,4,5,6}n\\Omega = \\{1,2,3,4,5,6\\}^n be a configuration space, with the uniform probability distribution PP. Let XiX_i be the ii-th coordinate function. Consider the joint random variable

\n

Z=(X1,..,Xn).Z = (X_1,..,X_n).

\n

What the space of values of ZZ? What is the law of ZZ?

\n

Consider now Z:\u03a9\u2192{1,2,3,4,5,6}2,Z=(X1,X2).Z:\\Omega \\rightarrow \\{1,2,3,4,5,6\\}^2,\\quad Z=(X_1,X_2).

\n

What is the law of ZZ?

\n
", "parent": "subsec:RV", "rank": "0", "html_name": "def:JoinRV", "summary": "
\n

Definition 7 (Joint random variables). Consider 22 random variables X:\u03a9\u2192EX:\\Omega \\rightarrow E and Y:\u03a9\u2192FY:\\Omega \\rightarrow F. The random variable Z:\u03a9\u2192E\u00d7FZ:\\Omega \\rightarrow E\\times F, Z=(X,Y)Z=(X,Y) is called the joint random variable. The law PZ=P(X,Y)P_{Z}=P_{(X,Y)} of ZZ is called the joint law (or joint probability distribution).

\n
", "hasSummary": true, "hasTitle": true, "title": "Joint random variables"}, "classes": "l0", "position": {"x": -5158.705568517556, "y": 3888.000422320245}}, {"group": "nodes", "data": {"id": "ex:SpGM", "name": "exercise", "text": "
\n

Exercise 4 (Simple game model).
\n
\nConsider the following game. A player flips twice a coin. If both flips are heads, the player wins 11 euro. Otherwise he loses 11 euro. Make a probabilistic model which describes the game.

\n
", "parent": "subsec:RV", "rank": "0", "html_name": "ex:SpGM", "summary": "
\n

Exercise 4 (Simple game model).

\n
", "hasSummary": false, "hasTitle": true, "title": "Simple game model"}, "classes": "l0", "position": {"x": -2755.6188300699455, "y": 4768.891256628461}}, {"group": "nodes", "data": {"id": "ex:ImRV", "name": "exercise", "text": "
\n

Exercise 5 (Image as random variable).
\n
\nLet II be a 512\u00d7512512\\times 512 image valued in {0,1,...,255}\\{0,1,...,255\\}. II can be viewed as a random variable from the configuration space \u03a9={0,1,...,511}2\u2192{0,1,...,255}\\Omega = \\{0,1,...,511\\}^2\\rightarrow \\{0,1,...,255\\}. Assume that PP is a uniform probability distribution over \u03a9\\Omega. The law PIP_I of II corresponds to something that we already meet in the image processing course. What is it? What is the formal difference?

\n
", "parent": "subsec:RV", "rank": "0", "html_name": "ex:ImRV", "summary": "
\n

Exercise 5 (Image as random variable).

\n
", "hasSummary": false, "hasTitle": true, "title": "Image as random variable"}, "classes": "l0", "position": {"x": -2763.4777060587226, "y": 4388.939429960873}}, {"group": "nodes", "data": {"id": "ex:SumEq01", "name": "exercise", "text": "
\n

Exercise 6 (Sums of equiprobable 0s and 1s).
\n
\n

\n
\n
\n

Theory

\n
\n


\nConsider the configuration space \u03a9={0,1}n\\Omega = \\{ 0,1 \\}^n (\u03a9\\Omega can be interpreted as the vertices of an hypercube), with the uniform probability distribution. What is the cardinal of \u03a9\\Omega ?
\nConsider now the random variable X:\u03a9\u2192{0,..,n}X:\\Omega \\rightarrow \\{0,..,n\\}, X((\u03c91,...,\u03c9n))=\u2211i=1n\u03c9i,X((\\omega_1,...,\\omega_n)) = \\sum_{i=1}^n \\omega_i, which counts the number of 11 in each nn-tuple. What is the law of the random variable XX?

\n
\n
\n

Application: probability of a binary image

\n
\n


\nPropose a very simple probabilistic model over the set of binary images of size 256\u00d7256256\\times 256. What can we say about the probability that

\n\n
", "parent": "subsec:RV", "rank": "0", "html_name": "ex:SumEq01", "summary": "
\n

Exercise 6 (Sums of equiprobable 0s and 1s).

\n
", "hasSummary": false, "hasTitle": true, "title": "Sums of equiprobable 0s and 1s"}, "classes": "l0", "position": {"x": -3966.066245820806, "y": 4379.791652955868}}, {"group": "nodes", "data": {"id": "ex:SimJoin", "name": "exercise", "text": "
\n

Exercise 7 (Simple joint law).
\n
\nConsider \u03a9={0,1}2\\Omega=\\{0,1\\}^2 with a uniform probability, and X1X_1 and X2X_2 the coordinate random variables. Let YY and ZZ be two random variables \u03a9\u2192{0,1,2}\\Omega \\rightarrow \\{0,1,2\\} defined by

\n

Y=X1+X2 and Z=X1X2.Y = X_1 + X_2 \\text{ and } Z = X_1X_2.

\n

Give the laws of YY, ZZ, and the joint law of (Y,Z)(Y,Z).

\n
", "parent": "subsec:RV", "rank": "0", "html_name": "ex:SimJoin", "summary": "
\n

Exercise 7 (Simple joint law).

\n
", "hasSummary": false, "hasTitle": true, "title": "Simple joint law"}, "classes": "l0", "position": {"x": -5140.737592553508, "y": 4382.769647635065}}, {"group": "nodes", "data": {"id": "def:Marg", "name": "definition", "text": "
\n

Definition 8 (Concept of marginal).
\n
\n

\n
\n
\n

Coordinate projection

\n
\n


\nBy definition, elements of E\u00d7FE\\times F are couples (x,y)(x,y) with x\u2208Ex\\in E and y\u2208Fy\\in F. Note \u03c0E\\pi_E and \u03c0F\\pi_F the maps which map (x,y)(x,y) to its first and second coordinate,

\n

\u03c0E((x,y))=x and \u03c0F((x,y))=y.\\pi_E((x,y))=x \\text{ and } \\pi_F((x,y))=y.

\n

\u03c0E\\pi_E and \u03c0F\\pi_F are called projections on EE and FF.

\n
\n
\n

Marginal: idea

\n
\n


\nSome mathematical objects (probabilities and random variables) defined on the Cartesian product E\u00d7FE\\times F naturally give rise to similar objects defined on EE or FF by projecting them on EE or FF, in a sens that will be made clear in subsequent nodes. The objects defined in this manner on EE and FF are called marginals of the object on E\u00d7FE\\times F.
\n

\n

By projecting on a coordinate, we \u2019forget\u2019 the other one. As we will see, \u2019forgetting\u2019 the other coordinate is opposed to conditioning, where the other coordinate is fixed to a particular value.

\n
", "parent": "subsec:Marg", "rank": "0", "html_name": "def:Marg", "summary": "
\n

Definition 8 (Concept of marginal). Some mathematical objects defined on the Cartesian product E\u00d7FE\\times F naturally give rise to similar objects defined on EE and FF by \u2019forgetting\u2019 the other coordinate. The objects defined in this manner on EE and FF are called marginal.

\n
", "hasSummary": true, "hasTitle": true, "title": "Concept of marginal"}, "classes": "l0", "position": {"x": -7666.146928631908, "y": 3152.463522697209}}, {"group": "nodes", "data": {"id": "def:MargProb", "name": "definition", "text": "
\n

Definition 9 (Marginal probability).
\n
\n

\n
\n
\n

Definition

\n
\n


\nLet PE\u00d7FP_{E\\times F} be a probability on the Cartesian product E\u00d7FE\\times F. The marginal probability on EE is defined by

\n

PE(A)=PE\u00d7F(\u03c0E\u22121(A))=PE\u00d7F(A,F)P_E(A) = P_{E\\times F}(\\pi_E^{-1}(A)) = P_{E\\times F}(A,F)

\n

and the marginal on FF by

\n

PF(B)=PE\u00d7F(\u03c0F\u22121(B))=PE\u00d7F(E,B).P_F(B) = P_{E\\times F}(\\pi_F^{-1}(B)) = P_{E\\times F}(E,B).

\n
\n
\n

Particular cases

\n
\n


\n

\n\n
", "parent": "subsec:Marg", "rank": "0", "html_name": "def:MargProb", "summary": "
\n

Definition 9 (Marginal probability). Let PE\u00d7FP_{E\\times F} be a probability on the Cartesian product E\u00d7FE\\times F. The marginal probability on EE is defined by PE(A)=PE\u00d7F(A,F).P_E(A) = P_{E\\times F}(A,F).

\n
", "hasSummary": true, "hasTitle": true, "title": "Marginal probability"}, "classes": "l0", "position": {"x": -7574.774146852916, "y": 2525.6983691605765}}, {"group": "nodes", "data": {"id": "def:MargRV", "name": "definition", "text": "
\n

Definition 10 (Marginal random variables).
\n
\n

\n
\n
\n

Definition

\n
\n


\nConsider a random variable Z:\u03a9\u2192E\u00d7FZ:\\Omega \\rightarrow E \\times F. The marginal random variables XX and YY on EE and on FF are defined by

\n

X=\u03c0E\u2218Z and Y=\u03c0F\u2218Z.X = \\pi_E \\circ Z \\text{ and } Y = \\pi_F \\circ Z.

\n
\n
\n

The laws of marginal variables

\n
\n


\nThe laws of XX and YY are called the marginal laws. They are of course the marginals of the joint probability PZP_{Z} (prove it). The notations PEP_E and PFP_F introduced in the general case are replaced by PXP_X and PYP_Y. When the density exist, they are noted fXf_X and fYf_Y.
\n

\n
\n
\n

Examples

\n
\n


\n

\n\n
", "parent": "subsec:Marg", "rank": "0", "html_name": "def:MargRV", "summary": "
\n

Definition 10 (Marginal random variables). Let Z:\u03a9\u2192E\u00d7FZ:\\Omega\\rightarrow E\\times F be a random variable, and let XX and YY be the random variables corresponding to each coordinate: Z=(X,Y).Z=(X,Y). XX is called the marginal random variable on EE.

\n
", "hasSummary": true, "hasTitle": true, "title": "Marginal random variables"}, "classes": "l0", "position": {"x": -8825.065439679365, "y": 2569.562259332014}}, {"group": "nodes", "data": {"id": "ex:SimPExa", "name": "exercise", "text": "
\n

Exercise 8 (A simple example).
\n
\nConsider the configuration space \u03a9={a,b}2\\Omega = \\{a,b\\}^2,

\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n
(a,a)(a,b)
(b,a)(b,b)
\n


\n

\n
\n

With probabilities
\n

\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n
P(a,a)=0.02P(a,a)=0.02P(a,b)=0.08P(a,b)=0.08
P(b,a)=0.48P(b,a)=0.48P(b,b)=0.42P(b,b)=0.42
\n
\n

Note X1X_1 and X2X_2 the coordinate random variables on \u03a9\\Omega.

\n\n
", "parent": "subsec:Marg", "rank": "0", "html_name": "ex:SimPExa", "summary": "
\n

Exercise 8 (A simple example).

\n
", "hasSummary": false, "hasTitle": true, "title": "A simple example"}, "classes": "l0", "position": {"x": -10021.854885608795, "y": 3276.801731567818}}, {"group": "nodes", "data": {"id": "ex:MargGauss", "name": "exercise", "text": "
\n

Exercise 9 (Marginals of 2D2D Gaussians).
\n
\nLet \u03a9=\u211d2\\Omega = \\mathbb{R}^2 and let PP be a probability with density

\n

f(x,y)=12\u03c0\u03c32e\u2212(x\u2212x0)2+(y\u2212y0)22\u03c32.f(x,y)=\\frac{1}{2\\pi \\sigma^2 }e^{-\\frac{(x-x_0)^2+(y-y_0)^2}{2\\sigma^2}}.

\n\n
", "parent": "subsec:Marg", "rank": "0", "html_name": "ex:MargGauss", "summary": "
\n

Exercise 9 (Marginals of 2D2D Gaussians).

\n
", "hasSummary": false, "hasTitle": true, "title": "Marginals of 2D2D Gaussians"}, "classes": "l0", "position": {"x": -10013.868622442598, "y": 2864.2827750796514}}, {"group": "nodes", "data": {"id": "ex:LandSp", "name": "exercise", "text": "
\n

Exercise 10 (Landing a spaceship).
\n
\n

\n
\n
\n

Error function

\n
\n


\nThe following function erf(z)=2\u03c0\u222b0ze\u2212x2dxerf(z)=\\frac{2}{\\sqrt{\\pi}} \\int_{0}^{z} e^{-x^2}dx

\n

is called the \u2019Gaussian error function\u2019. Express this integral,

\n

\u222b\u03bc\u2212\u03c3\u03bc+\u03c3f\u03bc,\u03c3(x)dx,\\int_{\\mu-\\sigma}^{\\mu+\\sigma} f_{\\mu,\\sigma}(x)dx,

\n

with the erferf function. Ask google its value.

\n
\n
\n

Application

\n
\n


\nAssume that a spaceship wants to land on earth at a certain location. Assume that in the region of the landing zone the earth surface is assimilated to a plan. Given 22 coordinate axis, points on the surface of the earth can be identified to points of \u211d2\\mathbb{R}^2. The spaceship aims at the point of coordinate (x0,y0)(x_0,y_0), but the wind introduce a perturbation on the landing point. In about 7070% of landings the perturbation on each coordinate is smaller than 11 meters.
\nPropose a reasonable probabilistic modelization of the landing problem in which, \u03a9\\Omega is an abstract space not entirely specified, and the landing position is a random variable.

\n
", "parent": "subsec:Marg", "rank": "0", "html_name": "ex:LandSp", "summary": "
\n

Exercise 10 (Landing a spaceship).

\n
", "hasSummary": false, "hasTitle": true, "title": "Landing a spaceship"}, "classes": "l0", "position": {"x": -10013.868622442598, "y": 2480.1339980091607}}, {"group": "nodes", "data": {"id": "th:Bayes", "name": "theorem", "text": "
\n

Theorem 1 (Bayes Theorem).
\n
\nThe definition of conditional densities can be rewritten as something called the \"Bayes theorem\":

\n

P(B|A)P(A)=P(A|B)P(B)=P(A\u2229B)P(B|A)P(A) = P(A|B)P(B) = P(A\\cap B)

\n
", "parent": "subsec:Cond", "rank": "0", "html_name": "th:Bayes", "summary": "
\n

Theorem 1 (Bayes Theorem). P(B|A)P(A)=P(A|B)P(B)=P(A\u2229B)P(B|A)P(A) = P(A|B)P(B) = P(A\\cap B)

\n
", "hasSummary": true, "hasTitle": true, "title": "Bayes Theorem"}, "classes": "l0", "position": {"x": -3698.7342091408714, "y": 2127.4049655263498}}, {"group": "nodes", "data": {"id": "def:CondProb", "name": "definition", "text": "
\n

Definition 11 (Conditional probability).
\n
\n

\n
\n
\n

Idea

\n
\n


\nConsider a probability PP on the set of configurations \u03a9\\Omega. Given a event AA with P(A>0)P(A>0), the idea of conditioning with respect to AA is to focus on the configurations \u03c9\u2208A\\omega \\in A and forget about the configurations \u03c9\u2209A\\omega \\notin A. We would like to define a new probability which respects the proportions of the events included in AA and gives zero probability to event which do not intersect AA.
\n

\n
\n
\n

Definition

\n
\n


\nLet AA be an event of \u03a9\\Omega, with P(A)>0P(A)>0. We define the probability conditional to the event AA by PA(B)=P(B\u2229A)P(A).P_A(B) = \\frac{P(B\\cap A)}{P(A)}. PA(B)P_A(B) is called the probability of BB given AA, or probability of BB knowing AA, and is usually noted P(B|A)P(B|A).
\nExercise: check that PAP_A is a probability on \u03a9\\Omega
\n

\n
\n
\n

Examples: cards

\n
\n


\nAssume that \u03a9\\Omega is the set of cards of a standard 52 card deck, \u03a9={1,2,3,4,5,6,7,8,9,10,jack,queen,king,ace}\u00d7{clubs,diamonds,hearts,spades}\\begin{aligned}\n \\Omega & = \\{1,2,3,4,5,6,7,8,9,10,jack,queen,king,ace\\}\\\\\n & \\times \\{clubs,diamonds,hearts,spades\\}\\\\\n \\end{aligned} and PP is the uniform distribution.

\n\n
\n
\n

Examples: dice

\n
\n


\nAssume that a dice is rolled twice and that we have a uniform distribution PP on \u03a9={1,..,6}2\\Omega=\\{1,..,6\\}^2.

\n\n
", "parent": "subsec:Cond", "rank": "0", "html_name": "def:CondProb", "summary": "
\n

Definition 11 (Conditional probability). We define the probability conditional to the event AA by PA(B)=P(B\u2229A)P(A).P_A(B) = \\frac{P(B\\cap A)}{P(A)}. PA(B)P_A(B) is called the probability of BB given AA, and is usually noted P(B|A)P(B|A).

\n
", "hasSummary": true, "hasTitle": true, "title": "Conditional probability"}, "classes": "l0", "position": {"x": -4871.671100695654, "y": 2254.936653287442}}, {"group": "nodes", "data": {"id": "def:CondCart", "name": "definition", "text": "
\n

Definition 12 (Conditional probabilities on a Cartesian product).
\n
\n

\n
\n
\n

Idea

\n
\n


\nLet PP be a probability on a Cartesian product E\u00d7FE\\times F. Assume that the marginal PF({y})P_F(\\{y\\}) is non null. We can condition PP by the event E\u00d7{y}E\\times \\{y\\}, \u2019the second coordinate is yy\u2019. This gives a new probability on E\u00d7FE\\times F, which can be interpreted as a probability on EE as follow.

\n
\n
\n

Definition

\n
\n


\nAssume that P(E\u00d7{y})>0P(E\\times \\{y\\})>0. For A\u2282EA\\subset E,

\n

P(A|E\u00d7{y})\u2190P(A\u00d7{y}|E\u00d7{y})=P((A\u00d7{y})\u2229(E\u00d7{y}))P(E\u00d7{y})=P((A\u2229E)\u00d7{y})PF({y})=P(A\u00d7{y})PF({y})\\begin{aligned}\nP(A | E\\times \\{y\\} ) &\\leftarrow& P(A\\times \\{y\\} | E\\times \\{y\\} )\\\\\n&=& \\frac{P((A\\times \\{y\\})\\cap (E\\times \\{y\\}))}{P(E\\times \\{y\\})}\\\\\n&=& \\frac{P((A\\cap E)\\times \\{y\\})}{P_F(\\{y\\})}\\\\\n&=& \\frac{P(A\\times \\{y\\})}{P_F(\\{y\\})}\\\\\\end{aligned}

\n

When all the densities exist, the conditional density at x\u2208Ex\\in E is f(x|E\u00d7{y})=f((x,y))fF(y),f(x|E\\times \\{y\\}) = \\frac{f((x,y))}{f_F(y)}, as long as the marginal density fF(y)>0f_F(y)>0.
\n

\n

Hence, in general, when conditioning a joint law by a coordinate value, we have

\n

conditional probability=joint probabilitymarginal probability.\\text{conditional probability} = \\frac{\\text{joint probability}}{ \\text{marginal probability}}.

\n
\n
\n

Conditional probabilities of a joint law

\n
\n


\nAssume now X:\u03a9\u2192EX:\\Omega \\rightarrow E and Y:\u03a9\u2192FY:\\Omega \\rightarrow F. The joint law P(X,Y)P_{(X,Y)}, is a probability on E\u00d7FE\\times F. In this context we use particular notations to condition on some y\u2208Fy\\in F. When A\u2282EA\\subset E,

\n

P(X,Y)(A|E\u00d7{y}) is written P(X\u2208A|Y=y).P_{(X,Y)}(A|E\\times \\{y\\}) \\text{ is written } P(X\\in A |Y=y).

\n

Using shortened notation, we can write

\n

P(X\u2208A|Y=y)\u2192P(A|y)=P(A,y)P(y).P(X\\in A |Y=y) \\rightarrow P(A|y)= \\frac{P(A,y)}{P(y)}.

\n

And when it exists, the conditional density is noted f(x|Y=y)=f(X,Y)(x,y)fY(y).f(x|Y=y) = \\frac{f_{(X,Y)}(x,y)}{f_Y(y)}.

\n
", "parent": "subsec:Cond", "rank": "0", "html_name": "def:CondCart", "summary": "
\n

Definition 12 (Conditional probabilities on a Cartesian product). Let PP be a probability on E\u00d7FE\\times F. The conditional probability knowing {y}\u2282F\\{y\\}\\subset F is given by P(A|E\u00d7{y})=P(A\u00d7{y})PF({y}),P(A|E\\times \\{y\\}) = \\frac{P(A\\times \\{y\\})}{P_F(\\{y\\})}, where AA is an event of EE.

\n
", "hasSummary": true, "hasTitle": true, "title": "Conditional probabilities on a Cartesian product"}, "classes": "l0", "position": {"x": -4842.334127720824, "y": 1526.3454399792772}}, {"group": "nodes", "data": {"id": "rem:CondCart", "name": "remark", "text": "
\n

Remark 5 (Conditioning on a Cartesian product).
\n
\nEarlier we considered the case of 22 dice rolls, \u03a9={1,..,6}2\\Omega = \\{1,..,6\\}^2 with the uniform distribution PP.

\n\n
", "parent": "subsec:Cond", "rank": "0", "html_name": "rem:CondCart", "summary": "
\n

Remark 5 (Conditioning on a Cartesian product). Considered the case of 22 dice rolls with the uniform distribution. Conditioning by \u2019the first roll is a 11\u2019 leads to a probability distribution on the second roll.

\n
", "hasSummary": true, "hasTitle": true, "title": "Conditioning on a Cartesian product"}, "classes": "l0", "position": {"x": -3705.4533780122365, "y": 1581.3149748236115}}, {"group": "nodes", "data": {"id": "ex:RCRD", "name": "exercise", "text": "
\n

Exercise 11 (Random card in random deck).
\n
\nConsider that a box contains 1010 card decks. 33 decks contain 5252 cards (2 - Ace) and 77 decks contain 3232 cards (7-Ace). Without watching, a deck is chosen in the box, and a card is chosen in the deck. Given that the card is a 1010, what is the probability that it comes from a 5252 card deck ?

\n
", "parent": "subsec:Cond", "rank": "0", "html_name": "ex:RCRD", "summary": "
\n

Exercise 11 (Random card in random deck).

\n
", "hasSummary": false, "hasTitle": true, "title": "Random card in random deck"}, "classes": "l0", "position": {"x": -2522.109017729677, "y": 2345.4030045895497}}, {"group": "nodes", "data": {"id": "ex:DWRPM", "name": "exercise", "text": "
\n

Exercise 12 (Draws without replacement (probabilistic model)).
\n
\nThis exercise is similar to the question 2 of Exercise\u00a02 \u2019Reminder on combinatorics\u2019, but we will re-express our previous reasonings in language of probabilities and random variables.
\n

\n

Consider again that elements of \ud835\udcae\\mathcal{S} are nn physical objects contained in a box. Assume that we draw successively kk element from the box, without looking in the box, and without putting the element back before taking the next one. We want to now build a probabilistic model of the possible draws. We will model the experiment with the configuration space \u03a9=\ud835\udcaek\\Omega = \\mathcal{S}^k and call XiX_i the coordinate random variables.

\n\n
", "parent": "subsec:Cond", "rank": "0", "html_name": "ex:DWRPM", "summary": "
\n

Exercise 12 (Draws without replacement (probabilistic model)).

\n
", "hasSummary": false, "hasTitle": true, "title": "Draws without replacement (probabilistic model)"}, "classes": "l0", "position": {"x": -2532.2100549601205, "y": 1876.5173886765588}}, {"group": "nodes", "data": {"id": "ex:CondGauss", "name": "exercise", "text": "
\n

Exercise 13 (Conditionals of 2D2D Gaussians).
\n
\nAgain, let \u03a9=\u211d2\\Omega = \\mathbb{R}^2 and let PP be a probability with density

\n

f(x,y)=12\u03c0\u03c32e\u2212(x\u2212x0)2+(y\u2212y0)22\u03c32.f(x,y)=\\frac{1}{2\\pi \\sigma^2 }e^{-\\frac{(x-x_0)^2+(y-y_0)^2}{2\\sigma^2}}.

\n

Give the conditional density on the first coordinate given that the second coordinate has a fixed value yy.

\n
", "parent": "subsec:Cond", "rank": "0", "html_name": "ex:CondGauss", "summary": "
\n

Exercise 13 (Conditionals of 2D2D Gaussians).

\n
", "hasSummary": false, "hasTitle": true, "title": "Conditionals of 2D2D Gaussians"}, "classes": "l0", "position": {"x": -2536.920292020898, "y": 1410.897130613792}}, {"group": "nodes", "data": {"id": "th:JLIV", "name": "theorem", "text": "
\n

Theorem 2 (Joint law of independent variables).
\n
\n

\n
\n
\n

Theorem

\n
\n


\nLet X:\u03a9\u2192\u2124X:\\Omega \\rightarrow \\mathbb{Z} and Y:\u03a9\u2192\u2124Y:\\Omega \\rightarrow \\mathbb{Z} be independent RV.

\n\n

When XX and YY are real random variables the results hold for densities (when they exist):

\n

f(X,Y)((x,y))=fX(x)fY(y) and fY(x)=f(x|Y=y)f_{(X,Y)}((x,y)) = f_X(x)f_Y(y)\\quad \\text{ and }\\quad f_Y(x)=f(x|Y=y)

\n
\n
\n

Proofs of 1) and 2)

\n
\n


\n11) Call ZZ the joint variable Z=(X,Y)Z=(X,Y). We have

\n

Z\u22121({(i,j)})=X\u22121({i})\u2229Y\u22121({j}).Z^{-1}(\\{(i,j)\\})=X^{-1}(\\{i\\})\\cap Y^{-1}(\\{j\\}).

\n

Hence,

\n

P(X,Y)({(i,j)})=P(X\u22121({i})\u2229Y\u22121({j}))=P(X\u22121({i})P(Y\u22121({j}))=PX({i})PY({j})\\begin{aligned}\nP_{(X,Y)}(\\{(i,j)\\})&=& P( X^{-1}(\\{i\\})\\cap Y^{-1}(\\{j\\}))\\\\\n&=& P(X^{-1}(\\{i\\})P(Y^{-1}(\\{j\\}))\\\\\n&=& P_X(\\{i\\})P_Y(\\{j\\})\\\\\\end{aligned}

\n

22) When PY({i})>0P_Y(\\{i\\})>0, P(X=i|Y=j)=P(X,Y)({i,j})PY({j})=PX({i})PY({i})PY({j})=PX({i})\\begin{aligned}\nP(X=i | Y = j)&=&\\frac{P_{(X,Y)}(\\{i,j\\})}{P_Y(\\{j\\}) }\\\\\n&=&\\frac{P_X(\\{i\\})P_Y(\\{i\\})}{P_Y(\\{j\\}) }\\\\\n&=&P_X(\\{i\\}) \\\\\\end{aligned}

\n
\n
\n

Example

\n
\n


\nLet \u03a9={a,b}2\\Omega = \\{a,b\\}^2 with

\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n
P({(a,a)})=0.02P(\\{(a,a)\\})=0.02P({(a,b)})=0.08P(\\{(a,b)\\})=0.08
P({(b,a)})=0.48P(\\{(b,a)\\})=0.48P({(b,b)})=0.42P(\\{(b,b)\\})=0.42.
\n
\n

Call X1X_1 and X2X_2 the coordinate random variables. Are X1X_1 and X2X_2 independent?
\n

\n
", "parent": "subsec:Ind", "rank": "0", "html_name": "th:JLIV", "summary": "
\n

Theorem 2 (Joint law of independent variables). P(X,Y)(A\u00d7B)=PX(A)PY(B)P_{(X,Y)}(A\\times B)=P_X(A)P_Y(B) P(X\u2208A|Y=y)=PX(A),P(X\\in A | Y = y) = P_X(A),

\n
", "hasSummary": true, "hasTitle": true, "title": "Joint law of independent variables"}, "classes": "l0", "position": {"x": -7044.081789564841, "y": 288.5955547839242}}, {"group": "nodes", "data": {"id": "def:IndEv", "name": "definition", "text": "
\n

Definition 13 (Independent events).
\n
\n

\n
\n
\n

Motivation

\n
\n


\nConsider again 22 dice rolls with a uniform probability distribution PP. Consider the two events

\n\n

Remember that conditional probabilities are given by

\n

P(B|A)=P(A\u2229B)P(A)=P({(6,6)}){6}\u00d7{1,..,6}=16P(B|A) = \\frac{P(A\\cap B)}{P(A)} = \\frac{P(\\{(6,6)\\})}{\\{6\\}\\times \\{1,..,6\\}}=\\frac{1}{6}

\n

Hence P(B)=P(B|A)P(B)=P(B|A): the probability of getting 66 on the second roll is the same as the probability of getting 66 on the second roll knowing that the first roll was a 66. We say in that case that the event BB does not depend on the event AA. Reversing the conditioning in the formula shows that AA is also independent of BB.

\n
\n
\n

Definition

\n
\n


\nTwo events AA and BB are called independent when

\n

P(A\u2229B)=P(A)P(B)P(A\\cap B) =P(A)P(B)

\n
\n
\n

Interpretation

\n
\n


\nWhen P(A)>0P(A)>0: the proportion of B\u2229AB\\cap A in AA is the same as the proportion of BB in \u03a9\\Omega

\n

P(B\u2229A)P(A)=P(B)P(\u03a9)=P(B).\\frac{P(B\\cap A)}{P(A)} = \\frac{P(B)}{P(\\Omega)}=P(B).

\n

In other words, conditioning probabilities to the event AA does not change the probability of BB.
\n

\n

When P(B)>0P(B)>0, the same can be said about the proportion of A\u2229BA\\cap B in BB.

\n
\n
\n

Example

\n
\n


\nLet \u03a9\\Omega be the 5252 card deck with uniform distribution PP. Show that the events \u2019the card is a diamond\u2019 and \u2019the card is an ace\u2019 are independent.

\n\n

and

\n

14\u00d7113=152,\\frac{1}{4} \\times \\frac{1}{13} = \\frac{1}{52},

\n

they are independent.

\n
", "parent": "subsec:Ind", "rank": "0", "html_name": "def:IndEv", "summary": "
\n

Definition 13 (Independent events). Two events AA and BB are called independents when P(A\u2229B)=P(A)P(B).P(A\\cap B) = P(A)P(B).

\n
", "hasSummary": true, "hasTitle": true, "title": "Independent events"}, "classes": "l0", "position": {"x": -7096.069015477114, "y": 1542.1955190811266}}, {"group": "nodes", "data": {"id": "def:IndRV", "name": "definition", "text": "
\n

Definition 14 (Independent random variables).
\n
\n

\n
\n
\n

Motivation

\n
\n


\nThe notion of independence of two random variables XX and YY is based on the notion of independence of events.
\n

\n

Consider two consecutive dice rolls. Call XX and YY the 2 coordinates random variables on the configuration space \u03a9={1,2,3,4,5,6}2\\Omega =\\{1,2,3,4,5,6 \\}^2 endowed with the uniform distribution PP.
\n

\n

Check that events X\u22121({i})\u2282\u03a9X^{-1}(\\{i\\})\\subset \\Omega and Y\u22121({j})\u2282\u03a9Y^{-1}(\\{j\\})\\subset \\Omega are independent for all ii and jj. Rephrased in common language: obtaining an ii for the first roll is independent of obtaining a jj for the second roll.
\n

\n

More generally we will require the independence of all events in \u03a9\\Omega described by a constraint on the values of XX and all events described by a constraint on the values of YY. In that case, an information on one of them does not affect the other.

\n
\n
\n

Definition

\n
\n


\nTwo random variables X:\u03a9\u2192EX:\\Omega \\rightarrow E and Y:\u03a9\u2192FY:\\Omega \\rightarrow F are independent when

\n

\u2200A\u2282E,B\u2282F,X\u22121(A) and Y\u22121(B) are independent events in \u03a9.\\forall A\\subset E,B\\subset F,\\quad X^{-1}(A) \\text{ and } Y^{-1}(B) \\text{ are independent events in } \\Omega.

\n
\n
\n

Remarks

\n
\n


\n

\n\n
", "parent": "subsec:Ind", "rank": "0", "html_name": "def:IndRV", "summary": "
\n

Definition 14 (Independent random variables). Two random variables X:\u03a9\u2192EX:\\Omega \\rightarrow E and Y:\u03a9\u2192FY:\\Omega \\rightarrow F are independent when : \u2200A\u2282E,B\u2282F,X\u22121(A) and Y\u22121(B) are independent events in \u03a9.\\forall A\\subset E,B\\subset F,\\quad X^{-1}(A) \\text{ and } Y^{-1}(B) \\text{ are independent events in } \\Omega.

\n
", "hasSummary": true, "hasTitle": true, "title": "Independent random variables"}, "classes": "l0", "position": {"x": -6798.704393552654, "y": 1020.20838151672}}, {"group": "nodes", "data": {"id": "def:iid", "name": "definition", "text": "
\n

Definition 15 (i.i.d variables).
\n
\n

\n
\n
\n

Definition

\n
\n


\nNN random variables XiX_i defined on a configuration space \u03a9\\Omega are independent and identically distributed when they are independent and with same law.
\nKnowing the law PP of one the variables determines the law of the joint distribution. In the discrete case the joint law \ud835\udc0f\\bm P is given by

\n

\ud835\udc0f({(x1,...,xn)})=P({x1})P({x2})..P({xn}).\\bm P(\\{(x_1,...,x_n)\\}) = P(\\{x_1\\})P(\\{x_2\\}) .. P(\\{x_n\\}).

\n

If PP has a density ff, the joint density \ud835\udc1f\\bm f is

\n

\ud835\udc1f((x1,...,xn))=f(x1)f(x2)..f(xn).\\bm f((x_1,...,x_n)) = f(x_1)f(x_2) .. f(x_n).

\n
\n
\n

Example

\n
\n


\nIn the modelling of the repetition of nn dice rolls, we have always used the uniform probability over all possible configurations. From this, we deduced the independence of the different rolls.
\nThe reasoning usually goes the other way. Let \u03a9={1,2,3,4,5,6}n\\Omega=\\{1,2,3,4,5,6\\}^n and the XiX_i be the coordinate random variables. It is usually reasonable to assume that the XiX_i are independent, and that their law is uniform on {1,2,3,4,5,6}\\{1,2,3,4,5,6\\}. They are hence i.i.d. variables. This determines the probability on \u03a9\\Omega: P({\u03c9=(\u03c91,...,\u03c9n)})=P(X1,...,Xn)({\u03c9=(\u03c91,...,\u03c9n)})=\u220fPXi({\u03c9i})=16n.\\begin{aligned}\nP(\\{\\omega=(\\omega_1,...,\\omega_n)\\}) &=& P_{(X_1,...,X_n)}(\\{\\omega=(\\omega_1,...,\\omega_n)\\})\\\\\n&=& \\prod P_{X_i}(\\{\\omega_i \\}) = \\frac{1}{6^n}.\\\\\\end{aligned}

\n

Hence PP is uniform.

\n
", "parent": "subsec:Ind", "rank": "0", "html_name": "def:iid", "summary": "
\n

Definition 15 (i.i.d variables). NN random variables XiX_i defined on a configuration space \u03a9\\Omega are independent and identically distributed when they are independent and with same law.
\n

\n
", "hasSummary": true, "hasTitle": true, "title": "i.i.d variables"}, "classes": "l0", "position": {"x": -7011.649719946043, "y": -244.86656227072086}}, {"group": "nodes", "data": {"id": "ex:Bin", "name": "exercise", "text": "
\n

Exercise 14 (Binomial).
\n
\nLet X1,...,XnX_1,...,X_n be independent Bernoulli variables of parameter pp. A Binomial variable is a sum of independent Bernoulli variables of same parameter. The law of a Binomial variable is noted B(p,n)B(p,n), where pp is the parameter of the Bernoulli, and nn is the number of Bernoulli variables in the sum.

\n\n
", "parent": "subsec:Ind", "rank": "0", "html_name": "ex:Bin", "summary": "
\n

Exercise 14 (Binomial).

\n
", "hasSummary": false, "hasTitle": true, "title": "Binomial"}, "classes": "l0", "position": {"x": -7024.526699601474, "y": -661.9589284740644}}, {"group": "nodes", "data": {"id": "def:IndExp", "name": "theorem", "text": "
\n

Theorem 3 (Independence and expectation).
\n
\n

\n
\n
\n

Statement

\n
\n


\nWhen XX and YY are two independent variables, we have that \ud835\udd3c(XY)=\ud835\udd3c(X)\ud835\udd3c(Y).\\mathbb{E}(XY) = \\mathbb{E}(X)\\mathbb{E}(Y).

\n
\n
\n

Proof

\n
\n


\nWhen the laws have densities, the result is given by the following computation: \ud835\udd3c(XY)=\u222b\u222bxyf(X,Y)dxdy=\u222b\u222bxyfX(x)fY(y)dxdy=\u222bxfX(x)dx\u222byfY(y)dy\\begin{aligned}\n\\mathbb{E}(XY) &=& \\int \\int xy f_{(X,Y)}dxdy \\\\\n&=& \\int \\int xy f_X(x)f_Y(y)dxdy\\\\\n&=& \\int xf_X(x)dx \\int yf_Y(y)dy\\end{aligned}

\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:IndExp", "summary": "
\n

Theorem 3 (Independence and expectation). When XX and YY are two independent variables, we have that \ud835\udd3c(XY)=\ud835\udd3c(X)\ud835\udd3c(Y).\\mathbb{E}(XY) = \\mathbb{E}(X)\\mathbb{E}(Y).

\n
", "hasSummary": true, "hasTitle": true, "title": "Independence and expectation"}, "classes": "l0", "position": {"x": -1284.4874785919005, "y": 1204.6149245783558}}, {"group": "nodes", "data": {"id": "th:IndCov", "name": "theorem", "text": "
\n

Theorem 4 (Independence and covariance).
\n
\nLet XX and YY be independent variables. It can be checked that XcX_c and YcY_c are also independent. We have

\n

X and Y are independent \u21d2cov(X,Y)=\ud835\udd3c(XcYc)=\ud835\udd3c(Xc)\ud835\udd3c(Yc)=0.X \\text{ and } Y \\text{ are independent } \\Rightarrow cov(X,Y)=\\mathbb{E}(X_cY_c)=\\mathbb{E}(X_c)\\mathbb{E}(Y_c)=0.

\n

Hence, independence and covariance are related notions. If two variables XX and YY have non null covariance, knowing one gives information on the other. Note however that the converse is not true.

\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "th:IndCov", "summary": "
\n

Theorem 4 (Independence and covariance). X and Y are independent \u21d2cov(X,Y)=0.X \\text{ and } Y \\text{ are independent } \\Rightarrow cov(X,Y)=0.

\n
", "hasSummary": true, "hasTitle": true, "title": "Independence and covariance"}, "classes": "l0", "position": {"x": -1275.363933513836, "y": 708.3709925139169}}, {"group": "nodes", "data": {"id": "def:StdProp", "name": "theorem", "text": "
\n

Theorem 5 (Properties of variance).
\n
\n

\n
\n
\n

Properties

\n
\n


\n

\n\n
\n
\n

Proofs

\n
\n


\n1)1)
\nv(X+b)=\u2225X+b\u2212\ud835\udd3c(X+b)\u22252=\u2225X+b\u2212\ud835\udd3c(X)\u2212b)\u22252=v(X)v(X+b)= \\|X+b-\\mathbb{E}(X+b)\\|^2=\\|X+b-\\mathbb{E}(X)-b)\\|^2=v(X)
\nv(aX)=\u2225aX\u2212\ud835\udd3c(aX)\u22252=\u2225a2X\u2212a2\ud835\udd3c(X))2\u2225=a2\u2225X\u2212\ud835\udd3c(X))2\u2225=a2v(X)v(aX)= \\|aX-\\mathbb{E}(aX)\\|^2=\\|a^2X-a^2\\mathbb{E}(X))^2\\|=a^2\\|X-\\mathbb{E}(X))^2\\|=a^2v(X)

\n

2)2)
\nv(X+Y)=\u27e8(X+Y)\u2212\ud835\udd3c(X+Y),(X+Y)\u2212\ud835\udd3c(X+Y)\u27e9=\u27e8Xc+Yc,Xc+Yc\u27e9=\u2225Xc\u22252+\u2225Yc\u22252+2\u27e8Xc,Yc\u27e9=v(X)+v(Y)\\begin{aligned}\nv(X+Y) &=& \\langle (X+Y)-\\mathbb{E}(X+Y),(X+Y)-\\mathbb{E}(X+Y) \\rangle \\\\\n&=& \\langle X_c+Y_c,X_c+Y_c \\rangle \\\\\n&=& \\|X_c\\|^2+\\|Y_c\\|^2+2\\langle X_c,Y_c\\rangle\\\\\n&=& v(X)+v(Y)\\end{aligned} This is simply the Pythagorean theorem. Alternatively we can write,

\n

v(X+Y)=\ud835\udd3c((X+Y\u2212\ud835\udd3c(X+Y))2)=\ud835\udd3c((X+Y\u2212\ud835\udd3c(X)+\ud835\udd3cY))2)=\ud835\udd3c((X\u2212\ud835\udd3c(X))2+(Y\u2212\ud835\udd3c(Y))2+2((X\u2212\ud835\udd3c(X))(Y\u2212\ud835\udd3c(Y)))=v(X)+v(Y)+2\ud835\udd3c(XY\u2212X\ud835\udd3c(Y)\u2212\ud835\udd3c(X)Y+\ud835\udd3c(X)\ud835\udd3c(Y))=v(X)+v(Y)+2(\ud835\udd3c(X)\ud835\udd3c(Y)\u2212\ud835\udd3c(X)\ud835\udd3c(Y)\u2212\ud835\udd3c(X)\ud835\udd3c(Y)+\ud835\udd3c(X)\ud835\udd3c(Y))\\begin{aligned}\nv(X+Y) &=& \\mathbb{E}( (X+Y - \\mathbb{E}(X+Y))^2) \\\\\n&=& \\mathbb{E}( (X+Y - \\mathbb{E}(X)+\\mathbb{E}Y))^2) \\\\\n&=& \\mathbb{E}( (X-\\mathbb{E}(X))^2+(Y- \\mathbb{E}(Y))^2 + 2((X-\\mathbb{E}(X))(Y- \\mathbb{E}(Y))) \\\\\n&=&v(X)+v(Y)+2\\mathbb{E}(XY-X\\mathbb{E}(Y)-\\mathbb{E}(X)Y+\\mathbb{E}(X)\\mathbb{E}(Y))\\\\\n&=&v(X)+v(Y)+2(\\mathbb{E}(X)\\mathbb{E}(Y)-\\mathbb{E}(X)\\mathbb{E}(Y)-\\mathbb{E}(X)\\mathbb{E}(Y)+\\mathbb{E}(X)\\mathbb{E}(Y))\\end{aligned}

\n
\n
\n

Important consequence

\n
\n


\nHence, when X1,...,XnX_1,...,X_n are independent variables with identical distributions, v(1n\u2211iXi)=1n2v(\u2211iXi)=1n2\u2211iv(Xi)=1nv(X1)\u2192n\u2192\u221e0v\\left(\\frac{1}{n}\\sum_i X_i\\right)=\\frac{1}{n^2}v\\left(\\sum_i X_i\\right) = \\frac{1}{n^2}\\sum_i v(X_i)=\\frac{1}{n}v(X_1)\\xrightarrow[n\\rightarrow \\infty]{} 0

\n

This is a very important result: summing independent and identically distributed (i.i.d) variables reduces the variance in 1n\\frac{1}{n} and the standard deviation in 1n\\frac{1}{\\sqrt{n}}.

\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:StdProp", "summary": "
\n

Theorem 5 (Properties of variance). v(X)=\ud835\udd3c(X2)\u2212\ud835\udd3c(X)2v(X)=\\mathbb{E}(X^2)-\\mathbb{E}(X)^2 v(aX+b)=a2v(X)v(aX+b)=a^2v(X) when XX and YY are independent, v(X+Y)=v(X)+v(Y).v(X+Y)=v(X)+v(Y).

\n
", "hasSummary": true, "hasTitle": true, "title": "Properties of variance"}, "classes": "l0", "position": {"x": -1264.554042025985, "y": 179.87513751583037}}, {"group": "nodes", "data": {"id": "def:Expe", "name": "definition", "text": "
\n

Definition 16 (Expectation).
\n
\n

\n
\n
\n

Idea

\n
\n


\nLet \u03a9={1,2}\\Omega=\\{1,2\\} with P({1})=110P(\\{1\\})=\\frac{1}{10} and P({2})=910P(\\{2\\})=\\frac{9}{10}. Let XX be a real random variable, X:\u03a9\u2192\u211dX:\\Omega \\rightarrow \\mathbb{R}. For each elementary configuration \u03c9\\omega, XX takes the value X(\u03c9)X(\\omega). The \u2019expectation\u2019 of XX is the \u2019average\u2019 value of the X(\u03c9)X(\\omega) with respect to the probabilities of the {\u03c9}\\{\\omega\\}. In our example:

\n

\ud835\udd3c(X)=X(1).P({1})+X(2).P({2})=1\u00d7110+2\u00d7910.\\mathbb{E}(X) = X(1).P(\\{1\\}) + X(2).P(\\{2\\}) = 1\\times \\frac{1}{10} + 2\\times \\frac{9}{10}.

\n
\n
\n

Definition

\n
\n


\nIf \u03a9=\u2124\\Omega=\\mathbb{Z}, the average of a random variable X:\u03a9\u2192\u211dX:\\Omega \\rightarrow \\mathbb{R} becomes:

\n

\ud835\udd3c(X)=\u2211i\u2208\u03a9X(i)\u00d7P({i}).\\mathbb{E}(X) = \\sum_{i\\in\\Omega} X(i)\\times P(\\{i\\}).

\n

Now consider the case \u03a9=\u211d\\Omega=\\mathbb{R}, with a probability PP of density ff. The formula for the expectation becomes (when the integral exists)

\n

\ud835\udd3c(X)=\u222b\u2212\u221e+\u221eX(\u03c9)f(\u03c9)d\u03c9.\\mathbb{E}(X) = \\int_{-\\infty}^{+\\infty} X(\\omega) f(\\omega)d\\omega.

\n

There is a definition of the expectation which does not depend on the nature of \u03a9\\Omega and PP. The expectation (average / mean) of XX is defined by

\n

\ud835\udd3c(X)=\u222b\u03a9XdP.\\mathbb{E}(X) = \\int_{\\Omega} XdP.

\n

Where the integral is a \u2019Lebesgue integrals\u2019 (as opposed to Riemann integrals). When \u03a9\\Omega is discrete, the Lebesgue integral becomes a sum over \u03a9\\Omega, while when PP has a density, dPdP can be replaced by f(\u03c9)d\u03c9f(\\omega)d\\omega. Lebesgue integrals will not be used in this course, but it is useful to be familiar with this notation.
\n

\n
\n
\n

\ud835\udd3c(X)\\mathbb{E}(X) from the law PXP_X

\n
\n


\nIn the discrete case, the previous definition is based on the sum of the X(\u03c9)X(\\omega) over all the configurations \u03c9\u2208\u03a9\\omega \\in \\Omega, weight by the probabilities P({\u03c9})P(\\{\\omega\\}). The same result can be obtained by summing all the possible values x\u2208Ex\\in E that X(\u03c9)X(\\omega) can take, weighted by their probabilities PX({x})P_X(\\{x\\}).
\nThis approach leads to the following important equalities

\n\n

Exercise: Prove the result when X:\u2124\u2192\u2124X:\\mathbb{Z}\\rightarrow \\mathbb{Z}.

\n
\n
\n

Linearity

\n
\n


\nThe set of random variables X:\u03a9\u2192\u211dX:\\Omega \\rightarrow \\mathbb{R} is a vector space ((X+Y)(\u03c9)=X(\u03c9)+Y(\u03c9)(X+Y)(\\omega)=X(\\omega)+Y(\\omega)). The set of random variables such that \ud835\udd3c(X)=\u222b\u03a9XdP\\mathbb{E}(X)=\\int_{\\Omega} XdP exists is again a vector space. Since integrals are linear, the expectation \ud835\udd3c:L1(\u03a9)\u2192\u211d\\mathbb{E}:L^1(\\Omega)\\rightarrow \\mathbb{R}

\n

X\u21a6\ud835\udd3c(X)=\u222b\u03a9XdPX \\mapsto \\mathbb{E}(X)=\\int_{\\Omega} XdP

\n

is a linear application valued in \u211d\\mathbb{R} (i.e. a linear form). In other words, we have the fundamentals properties:

\n

\ud835\udd3c(\u03b1X)=\u03b1\ud835\udd3c(X) and \ud835\udd3c(X+Y)=\ud835\udd3c(X)+\ud835\udd3c(Y).\\mathbb{E}(\\alpha X) = \\alpha \\mathbb{E}(X) \\quad \\text{ and } \\quad \\mathbb{E}(X+Y) = \\mathbb{E}(X) + \\mathbb{E}(Y).

\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:Expe", "summary": "
\n

Definition 16 (Expectation). The general definition of the expectation of a random variable X:\u03a9\u2192\u211dX:\\Omega\\rightarrow \\mathbb{R} is \ud835\udd3c(X)=\u222b\u03a9X(\u03c9)dP(\u03c9).\\mathbb{E}(X) = \\int_{\\Omega} X(\\omega)dP(\\omega). in particular, when \u03a9={1,..,N}\\Omega = \\{1,..,N\\}, \ud835\udd3c(X)=\u2211i\u2208\u03a9X(i)\u00d7P({i})=\u2211i\u2208\u2124i.PX({i}),\\mathbb{E}(X) = \\sum_{i\\in\\Omega} X(i)\\times P(\\{i\\})=\\sum_{i\\in \\mathbb{Z}} i.P_X(\\{i\\}), and when \u03a9=\u211d\\Omega = \\mathbb{R}, \ud835\udd3c(X)=\u222b\u2212\u221e+\u221eX(\u03c9)f(\u03c9)d\u03c9=\u222b\u2212\u221e+\u221exfX(x)dx.\\mathbb{E}(X) = \\int_{-\\infty}^{+\\infty} X(\\omega) f(\\omega)d\\omega=\\int_{-\\infty}^{+\\infty} xf_X(x)dx.

\n
", "hasSummary": true, "hasTitle": true, "title": "Expectation"}, "classes": "l0", "position": {"x": -1256.2681048153534, "y": 2246.8829423217985}}, {"group": "nodes", "data": {"id": "def:ScalRV", "name": "definition", "text": "
\n

Definition 17 (Inner products on random variables).
\n
\nThe expectation enables to define an important inner product on random variables. It provides a norm and a notion of angles between random variables. Let X:\u03a9\u2192\u211dX:\\Omega \\rightarrow \\mathbb{R} and Y:\u03a9\u2192\u211dY:\\Omega \\rightarrow \\mathbb{R} be 22 random variables. When it exists, we can define

\n

\u27e8X,Y\u27e9=\ud835\udd3c(XY),\\langle X,Y \\rangle = \\mathbb{E}(XY),

\n

where XYXY is understood as the function \u03c9\u21a6X(\u03c9)Y(\u03c9)\\omega \\mapsto X(\\omega)Y(\\omega).
\n

\n

Why is it an inner product ?

\n\n

Hence \u27e8X,Y\u27e9\\langle X,Y \\rangle is an inner product.

\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:ScalRV", "summary": "
\n

Definition 17 (Inner products on random variables). \u27e8X,Y\u27e9=\ud835\udd3c(XY)=\u222b\u03a9X(\u03c9)Y(\u03c9)dP(\u03a9),\\langle X,Y \\rangle = \\mathbb{E}(XY)= \\int_{\\Omega} X(\\omega)Y(\\omega)dP(\\Omega),

\n
", "hasSummary": true, "hasTitle": true, "title": "Inner products on random variables"}, "classes": "l0", "position": {"x": 8.002538748805819, "y": 1434.73204758651}}, {"group": "nodes", "data": {"id": "def:Cov", "name": "definition", "text": "
\n

Definition 18 (Covariance).
\n
\nGiven a random variable XX note Xc=X\u2212\ud835\udd3c(X)X_{c} = X-\\mathbb{E}(X) the \u2019centered\u2019 variable. The covariance between two variables XX and YY is the inner product between their centered versions: the covariance between X:\u03a9\u2192\u211dX:\\Omega\\rightarrow \\mathbb{R} and Y:\u03a9\u2192\u211dY:\\Omega\\rightarrow \\mathbb{R} is defined by

\n

cov(X,Y)=\u27e8Xc,Yc\u27e9=\ud835\udd3c((X\u2212\ud835\udd3c(X))(Y\u2212\ud835\udd3c(Y))),cov(X,Y)=\\langle X_{c},Y_{c}\\rangle = \\mathbb{E}\\left( (X-\\mathbb{E}(X))(Y-\\mathbb{E}(Y)) \\right),

\n

when it exists.

\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:Cov", "summary": "
\n

Definition 18 (Covariance). cov(X,Y)=\u27e8Xc,Yc\u27e9=\ud835\udd3c((X\u2212\ud835\udd3c(X))(Y\u2212\ud835\udd3c(Y))).cov(X,Y)=\\langle X_{c},Y_{c}\\rangle = \\mathbb{E}\\left( (X-\\mathbb{E}(X))(Y-\\mathbb{E}(Y)) \\right).

\n
", "hasSummary": true, "hasTitle": true, "title": "Covariance"}, "classes": "l0", "position": {"x": -117.1408484185086, "y": 765.9280894870162}}, {"group": "nodes", "data": {"id": "def:Std", "name": "definition", "text": "
\n

Definition 19 (Variance /Standard deviation).
\n
\n

\n
\n
\n

Definition

\n
\n


\nThe variance and standard deviation measure how a random variable varies around its mean. The deviation from the mean is given by the Euclidean norm of the centered variable Xc=X\u2212\ud835\udd3c(X)X_{c} = X-\\mathbb{E}(X). When it exists, the variance is defined by,

\n

v(X)=\ud835\udd3c((X\u2212\ud835\udd3c(X))2)=cov(X,X)=\u2225Xc\u22252,v(X)= \\mathbb{E}\\left((X-\\mathbb{E}(X))^2\\right)=cov(X,X)=\\|X_c\\|^2,

\n

and the standard deviation by

\n

\u03c3(X)=v(X)=\u2225Xc\u2225.\\sigma(X)=\\sqrt{v(X)} =\\|X_c\\|.

\n
\n
\n

Alternative formula

\n
\n


\nWe have the important equality v(X)=\ud835\udd3c(X2)\u2212\ud835\udd3c(X)2.v(X)=\\mathbb{E}(X^2)-\\mathbb{E}(X)^2.

\n

Proof: v(X)=\ud835\udd3c((X\u2212\ud835\udd3c(X))2)=\ud835\udd3c(X2\u22122\ud835\udd3c(X)X+\ud835\udd3c(X)2))=\ud835\udd3c(X2)\u22122\ud835\udd3c(\ud835\udd3c(X)X)+\ud835\udd3c(\ud835\udd3c(X)2)=\ud835\udd3c(X2)\u22122\ud835\udd3c(X)\ud835\udd3c(X)+\ud835\udd3c(\ud835\udd3c(X)2)=\ud835\udd3c(X2)\u2212\ud835\udd3c(X)2\\begin{aligned}\nv(X)&=&\\mathbb{E}\\left((X-\\mathbb{E}(X))^2\\right)\\\\\n&=&\\mathbb{E}\\left(X^2-2\\mathbb{E}(X)X+\\mathbb{E}(X)^2)\\right)\\\\\n&=&\\mathbb{E}(X^2)-2\\mathbb{E}(\\mathbb{E}(X)X)+\\mathbb{E}(\\mathbb{E}(X)^2)\\\\\n&=&\\mathbb{E}(X^2)-2\\mathbb{E}(X)\\mathbb{E}(X)+\\mathbb{E}(\\mathbb{E}(X)^2)\\\\\n&=&\\mathbb{E}(X^2)-\\mathbb{E}(X)^2 \\\\\\end{aligned}

\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:Std", "summary": "
\n

Definition 19 (Variance /Standard deviation). v(X)=\ud835\udd3c((X\u2212\ud835\udd3c(X))2),v(X)= \\mathbb{E}\\left((X-\\mathbb{E}(X))^2\\right), \u03c3(X)=v(X)=\ud835\udd3c((X\u2212\ud835\udd3c(X))2).\\sigma(X)=\\sqrt{v(X)} = \\sqrt{ \\mathbb{E}\\left((X-\\mathbb{E}(X))^2\\right)}.

\n
", "hasSummary": true, "hasTitle": true, "title": "Variance /Standard deviation"}, "classes": "l0", "position": {"x": -76.99749540434607, "y": 216.19300869132803}}, {"group": "nodes", "data": {"id": "def:CovM", "name": "definition", "text": "
\n

Definition 20 (Covariance matrix).
\n
\n

\n
\n
\n

Definition

\n
\n


\nConsider nn random variables Xi:\u03a9\u2192\u211dX_i:\\Omega \\rightarrow \\mathbb{R}. The variables can be put in a column vector \ud835\udc17=(X1,..,Xn)T\\bm X=(X_1,..,X_n)^T. \ud835\udc17\\bm X is then a random variable \ud835\udc17:\u03a9\u2192\u211dn\\bm X:\\Omega \\rightarrow \\mathbb{R}^n. Such random variables are often called random vectors.
\nFor a random column vector \ud835\udc17:\u03a9\u2192\u211dn\\bm X:\\Omega \\rightarrow \\mathbb{R}^n, when it exists the covariance matrix is defined by cov(\ud835\udc17)=\ud835\udd3c(\ud835\udc17c\ud835\udc17cT)=\ud835\udd3c((\ud835\udc17\u2212\ud835\udd3c(\ud835\udc17))(\ud835\udc17\u2212\ud835\udd3c(\ud835\udc17))T).cov(\\bm X)=\\mathbb{E}(\\bm X_{c} \\bm X_{c}^T)= \\mathbb{E}\\left((\\bm X-\\mathbb{E}(\\bm X))(\\bm X-\\mathbb{E}(\\bm X))^T\\right).

\n

When \ud835\udc17\\bm X is a line vector, the definition becomes cov(\ud835\udc17)=\ud835\udd3c(\ud835\udc17cT\ud835\udc17c).cov(\\bm X)=\\mathbb{E}(\\bm X_{c}^T\\bm X_{c}).

\n
\n
\n

Entries of the matrix cov(\ud835\udc17)cov(\\bm X)

\n
\n


\n\ud835\udd3c((X1\u2212\ud835\udd3c(X1)X2\u2212\ud835\udd3c(X2)..Xn\u2212\ud835\udd3c(Xn))(X1\u2212\ud835\udd3c(X1)X2\u2212\ud835\udd3c(X2)..Xn\u2212\ud835\udd3c(Xn)))\\mathbb{E}\\left( \\begin{pmatrix} X_1-\\mathbb{E}(X_1)\\\\ X_2 -\\mathbb{E}(X_2)\\\\ . \\\\ . \\\\ X_n-\\mathbb{E}(X_n) \\end{pmatrix}\\begin{pmatrix} X_1-\\mathbb{E}(X_1)& X_2 -\\mathbb{E}(X_2)& . & . & X_n-\\mathbb{E}(X_n) \\end{pmatrix} \\right) == (\ud835\udd3c((X1\u2212\ud835\udd3c(X1))(X1\u2212\ud835\udd3c(X1)))..\ud835\udd3c((X1\u2212\ud835\udd3c(X1))(Xn\u2212\ud835\udd3c(Xn)))....\ud835\udd3c((Xn\u2212\ud835\udd3c(Xn))(X1\u2212\ud835\udd3c(X1)))..\ud835\udd3c((Xn\u2212\ud835\udd3c(Xn))(Xn\u2212\ud835\udd3c(Xn))))\\begin{pmatrix} \n\\mathbb{E}\\left( ( X_1-\\mathbb{E}(X_1) )( X_1-\\mathbb{E}(X_1) )\n\\right) &.&.& \\mathbb{E}\\left( ( X_1-\\mathbb{E}(X_1) )( X_n-\\mathbb{E}(X_n) ) \\right) \\\\\n.&&&.\\\\\n.&&&.\\\\\n\\mathbb{E}\\left( ( X_n-\\mathbb{E}(X_n) )( X_1-\\mathbb{E}(X_1) )\n\\right) &.&.& \\mathbb{E}\\left( ( X_n-\\mathbb{E}(X_n) )( X_n-\\mathbb{E}(X_n) ) \\right) \\\\\n\\end{pmatrix}

\n

Hence we can see that cov(X)ij=cov(Xi,Xj)=\u27e8Xi,c,Xj,c\u27e9cov(X)_{ij}=cov(X_{i},X_{j})=\\langle X_{i,c},X_{j,c} \\rangle.
\n

\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "def:CovM", "summary": "
\n

Definition 20 (Covariance matrix). Consider nn random variables Xi:\u03a9\u2192\u211dX_i:\\Omega \\rightarrow \\mathbb{R}. The variables can be put in a column vector \ud835\udc17=(X1,..,Xn)\\bm X=(X_1,..,X_n)

\n
", "hasSummary": true, "hasTitle": true, "title": "Covariance matrix"}, "classes": "l0", "position": {"x": 1120.0630833685873, "y": 223.501005840664}}, {"group": "nodes", "data": {"id": "ex:Bern", "name": "exercise", "text": "
\n

Exercise 15 (Bernoulli).
\n
\nA Bernoulli random variable XX is a random variable valued in {0,1}\\{0,1\\}. The law PXP_X is determined by PX({1})P_X(\\{ 1 \\}), noted pp.

\n\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "ex:Bern", "summary": "
\n

Exercise 15 (Bernoulli).

\n
", "hasSummary": false, "hasTitle": true, "title": "Bernoulli"}, "classes": "l0", "position": {"x": -82.9913469130289, "y": 2489.8505672414767}}, {"group": "nodes", "data": {"id": "ex:ExpSum", "name": "exercise", "text": "
\n

Exercise 16 (Expected sum).
\n
\nConsider 33 rolls of a fair dice. What is the configuration space and the probability describing the rolls? Call SS the random variable \u2019sum of the 33 rolls\u2019. Compute \ud835\udd3c(S)\\mathbb{E}(S).

\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "ex:ExpSum", "summary": "
\n

Exercise 16 (Expected sum).

\n
", "hasSummary": false, "hasTitle": true, "title": "Expected sum"}, "classes": "l0", "position": {"x": 1094.3155288304629, "y": 2507.4109660435925}}, {"group": "nodes", "data": {"id": "ex:GaussMean", "name": "exercise", "text": "
\n

Exercise 17 (Mean of a Gaussian RV).
\n
\n

\n\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "ex:GaussMean", "summary": "
\n

Exercise 17 (Mean of a Gaussian RV).

\n
", "hasSummary": false, "hasTitle": true, "title": "Mean of a Gaussian RV"}, "classes": "l0", "position": {"x": -96.57582900774878, "y": 2114.7793235130284}}, {"group": "nodes", "data": {"id": "ex:GaussVar", "name": "exercise", "text": "
\n

Exercise 18 (Variance on a uni-dimensional Gaussian).
\n
\nLet \u03a9=\u211d\\Omega = \\mathbb{R} and let PP be a probability with Gaussian density

\n

f\u03c3(x)=12\u03c0\u03c32e\u2212x22\u03c32.f_{\\sigma}(x)=\\frac{1}{\\sqrt{2\\pi \\sigma^2}}e^{\\frac{-x^2}{2\\sigma^2}}.

\n

Knowing that

\n

\u222b\u2212\u221e+\u221e12\u03c0x2e\u2212x22dx=1,\\int_{-\\infty}^{+\\infty} \\frac{1}{\\sqrt{2\\pi}}x^2e^{-\\frac{x^2}{2}}dx=1,

\n

Compute the variance of the probability distribution.
\n

\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "ex:GaussVar", "summary": "
\n

Exercise 18 (Variance on a uni-dimensional Gaussian).

\n
", "hasSummary": false, "hasTitle": true, "title": "Variance on a uni-dimensional Gaussian"}, "classes": "l0", "position": {"x": 2250.5009496629027, "y": 731.5549544819082}}, {"group": "nodes", "data": {"id": "ex:GaussCov", "name": "exercise", "text": "
\n

Exercise 19 (Covariance of a bi-dimensional Gaussian).
\n
\nLet X:\u03a9\u2192\u211d2X:\\Omega \\rightarrow \\mathbb{R}^2 be a random variable whose law has the following density

\n

f\u03c31,\u03c32(x)=12\u03c0\u03c31\u03c32e\u221212(x12\u03c312+x22\u03c322),f_{\\sigma_1,\\sigma_2}(x)=\\frac{1}{2\\pi \\sigma_1\\sigma_2 }e^{-\\frac{1}{2}\\left(\\frac{x_1^2}{\\sigma_1^2}+\\frac{x_2^2}{\\sigma_2^2}\\right)},

\n

where x=(x1x2)x=\\begin{pmatrix} x_1\\\\x_2 \\end{pmatrix}.

\n\n

Let R=(cos(\u03b8)\u2212sin(\u03b8)sin(\u03b8)cos(\u03b8))R=\\begin{pmatrix} cos(\\theta)&-sin(\\theta)\\\\sin(\\theta)&cos(\\theta) \\end{pmatrix} be a rotation matrix, and let Y=RXY=RX be another random vector.

\n\n

Express the general relation between the covariance of a Gaussian density and the term in the exponential.

\n
", "parent": "subsec:MomRV", "rank": "0", "html_name": "ex:GaussCov", "summary": "
\n

Exercise 19 (Covariance of a bi-dimensional Gaussian).

\n
", "hasSummary": false, "hasTitle": true, "title": "Covariance of a bi-dimensional Gaussian"}, "classes": "l0", "position": {"x": 2243.324799893895, "y": 204.58760177480008}}, {"group": "nodes", "data": {"id": "def:WLLN", "name": "theorem", "text": "
\n

Theorem 6 ((Weak) Law of large numbers).
\n
\n

\n
\n
\n

Statement of the theorem

\n
\n


\nWe are now ready state an important result of the theory of probabilities, relating empirical means and expectations.
\n

\n

Let X1,...,Xn,...X_1,...,X_n,... be an infinite sequence i.i.d real random variables of mean \u03bc\\mu. Let X\u203en\\bar X_n be the empirical mean X\u203en=X1+..+Xnn.\\bar X_n = \\frac{X_1+..+X_n}{n}. For all \u03f5>0\\epsilon>0, we have P(|X\u203en\u2212\u03bc|>\u03f5)=P({\u03c9\u2208\u03a9||X\u203en(\u03c9)\u2212\u03bc|>\u03f5})=\u2192n\u2192\u221e0.P(|\\bar X_n-\\mu |>\\epsilon)=P(\\{\\omega \\in \\Omega |\\quad |\\bar X_n(\\omega) - \\mu |>\\epsilon\\})=\\xrightarrow[n\\rightarrow \\infty]{} 0.

\n

In simple words: when nn is large the values of X\u203en\\bar X_n are almost always close to \u03bc\\mu.
\n

\n
\n
\n

Idea of the proof

\n
\n


\nWe will not prove this result, but it can be intuitively understood in a simple way when the variable have variances. First, note that \ud835\udd3c(X\u203e)=\ud835\udd3c(X1)=\u03bc\\mathbb{E}(\\bar X)=\\mathbb{E}(X_1)=\\mu. Then, remember that v(X\u203en=1n\u2211iXi)\u2192n\u2192\u221e0.v\\left(\\bar X_n=\\frac{1}{n}\\sum_i X_i\\right)\\xrightarrow[n\\rightarrow \\infty]{} 0. Hence the law of the empirical mean X\u203en\\bar X_n is more and more concentrated around its expectation \u03bc\\mu, which means that the probability P(|X\u203en\u2212\u03bc|>\u03f5)P(|\\bar X_n-\\mu |>\\epsilon) should be smaller and smaller.

\n
", "parent": "subsec:LLN", "rank": "0", "html_name": "def:WLLN", "summary": "
\n

Theorem 6 ((Weak) Law of large numbers). Let X1,...,Xn,...X_1,...,X_n,... be i.i.d. X\u203en=X1+..+Xnn.\\bar X_n = \\frac{X_1+..+X_n}{n}. For all \u03f5>0\\epsilon>0, we have P(|X\u203en\u2212\u03bc|>\u03f5)\u2192n\u2192\u221e0.P(|\\bar X_n-\\mu |>\\epsilon)\\xrightarrow[n\\rightarrow \\infty]{} 0.

\n
", "hasSummary": true, "hasTitle": true, "title": "(Weak) Law of large numbers"}, "classes": "l0", "position": {"x": -3002.3955704605337, "y": -886.4056204438808}}, {"group": "nodes", "data": {"id": "def:EmpM", "name": "definition", "text": "
\n

Definition 21 (Empirical mean).
\n
\n

\n
\n
\n

Definition

\n
\n


\nIn general, the adjective \u2019empirical\u2019 is understood as \u2019coming from observations\u2019, as opposed to a computation made on the configuration space \u03a9\\Omega. Assume that performing a certain experiment lead to the observation of nn numbers xix_i, modeled by random variables XiX_i. The mean of a particular observation is x1+...+xnn.\\frac{x_1+...+x_n}{n}.

\n

which is described by the random variable

\n

X\u203en=X1+...+Xnn.\\bar X_n=\\frac{X_1+...+X_n}{n}.

\n

X\u203en\\bar X_n is called the \u2019empirical mean\u2019.
\n

\n

By linearity, the expectation of X\u203en\\bar X_n is given by \ud835\udd3c(X\u203en)=\ud835\udd3c(X1)+...+\ud835\udd3c(Xn)n.\\mathbb{E}(\\bar X_n)=\\frac{\\mathbb{E}(X_1)+...+\\mathbb{E}(X_n)}{n}.

\n
\n
\n

Link with expectation

\n
\n


\nThe notion of empirical mean has connections but is different from the notion of expectation, which is a mean over configurations from \u03a9\\Omega. Remember that when \u03a9\\Omega has a uniform probability over a finite numbers of element, \ud835\udd3c(X)=\u2211i=1|\u03a9|X(\u03c9i)|\u03a9|.\\mathbb{E}(X) = \\frac{\\sum_{i=1}^{|\\Omega|}X(\\omega_i)}{|\\Omega|}.

\n

Hence,

\n\n
", "parent": "subsec:LLN", "rank": "0", "html_name": "def:EmpM", "summary": "
\n

Definition 21 (Empirical mean). Let X1,...,XnX_1,...,X_n be RV. Their \u2019empirical mean\u2019 is following RV, X\u203en=X1+...+Xnn\\bar X_n=\\frac{X_1+...+X_n}{n}

\n
", "hasSummary": true, "hasTitle": true, "title": "Empirical mean"}, "classes": "l0", "position": {"x": -3008.780282009238, "y": -210.97656654692923}}, {"group": "nodes", "data": {"id": "sec:Prob", "name": "section", "text": "", "parent": "", "rank": "0", "html_name": "sec:Prob", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3885.676967972946, "y": 2710.2816977060456}}, {"group": "nodes", "data": {"id": "titlesec:Prob", "name": "sectionTitle", "text": "

Probabilities

", "parent": "sec:Prob", "rank": "0", "html_name": "sec:Prob", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3942.777393921063, "y": 6409.4690158559715}}, {"group": "nodes", "data": {"id": "subsec:ConfProb", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:ConfProb", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -7749.757904204901, "y": 5121.845864314333}}, {"group": "nodes", "data": {"id": "titlesubsec:ConfProb", "name": "subsectionTitle", "text": "

Configurations and probabilities

", "parent": "subsec:ConfProb", "rank": "0", "html_name": "subsec:ConfProb", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -7814.212974665837, "y": 6300.700516884763}}, {"group": "nodes", "data": {"id": "subsec:RV", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:RV", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3957.1621992937507, "y": 4274.1448945349875}}, {"group": "nodes", "data": {"id": "titlesubsec:RV", "name": "subsectionTitle", "text": "

Random Variables

", "parent": "subsec:RV", "rank": "0", "html_name": "subsec:RV", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -4063.0302596209003, "y": 4845.789366749729}}, {"group": "nodes", "data": {"id": "subsec:Marg", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:Marg", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -8798.314516230856, "y": 2897.418165008762}}, {"group": "nodes", "data": {"id": "titlesubsec:Marg", "name": "subsectionTitle", "text": "

Marginals

", "parent": "subsec:Marg", "rank": "0", "html_name": "subsec:Marg", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -8780.535452312293, "y": 3384.2740706855107}}, {"group": "nodes", "data": {"id": "subsec:Cond", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:Cond", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3696.8900592126656, "y": 1908.1243223472811}}, {"group": "nodes", "data": {"id": "titlesubsec:Cond", "name": "subsectionTitle", "text": "

Conditioning

", "parent": "subsec:Cond", "rank": "0", "html_name": "subsec:Cond", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3696.9254900614355, "y": 2476.8515140807704}}, {"group": "nodes", "data": {"id": "subsec:Ind", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:Ind", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -6905.508163704335, "y": 599.5812383016362}}, {"group": "nodes", "data": {"id": "titlesubsec:Ind", "name": "subsectionTitle", "text": "

Independence

", "parent": "subsec:Ind", "rank": "0", "html_name": "subsec:Ind", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -7131.811933856016, "y": 1877.6214050773367}}, {"group": "nodes", "data": {"id": "subsec:MomRV", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:MomRV", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": 483.0067355355011, "y": 1423.3745759249136}}, {"group": "nodes", "data": {"id": "titlesubsec:MomRV", "name": "subsectionTitle", "text": "

Moments of random variables

", "parent": "subsec:MomRV", "rank": "0", "html_name": "subsec:MomRV", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": 359.56828538700586, "y": 2809.374014333997}}, {"group": "nodes", "data": {"id": "subsec:LLN", "name": "subsection", "text": "", "parent": "sec:Prob", "rank": "0", "html_name": "subsec:LLN", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3242.911850592072, "y": -419.59974154820395}}, {"group": "nodes", "data": {"id": "titlesubsec:LLN", "name": "subsectionTitle", "text": "

Laws of large numbers

", "parent": "subsec:LLN", "rank": "0", "html_name": "subsec:LLN", "hasSummary": false, "hasTitle": false}, "classes": "l0", "position": {"x": -3485.42813072361, "y": 211.70613734747283}}, {"data": {"id": "rem:AbsConfsubsec:ConfProb", "source": "subsec:ConfProb", "target": "rem:AbsConf", "type": "strong", "visibility": 1}}, {"data": {"id": "rem:AbsConfsubsec:RV", "source": "subsec:RV", "target": "rem:AbsConf", "type": "strong", "visibility": 1}}, {"data": {"id": "def:ConfSprem:IntProb", "source": "rem:IntProb", "target": "def:ConfSp", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Evdef:ConfSp", "source": "def:ConfSp", "target": "def:Ev", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Probdef:Ev", "source": "def:Ev", "target": "def:Prob", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Probrem:ProbAss", "source": "rem:ProbAss", "target": "def:Prob", "type": "strong", "visibility": 1}}, {"data": {"id": "def:ProbDendef:Prob", "source": "def:Prob", "target": "def:ProbDen", "type": "strong", "visibility": 1}}, {"data": {"id": "rem:Evdef:Ev", "source": "def:Ev", "target": "rem:Ev", "type": "strong", "visibility": 1}}, {"data": {"id": "rem:ProbAssdef:Ev", "source": "def:Ev", "target": "rem:ProbAss", "type": "strong", "visibility": 1}}, {"data": {"id": "def:RVLawdef:RV", "source": "def:RV", "target": "def:RVLaw", "type": "strong", "visibility": 1}}, {"data": {"id": "def:JoinRVdef:RV", "source": "def:RV", "target": "def:JoinRV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:JoinRVdef:RVLaw", "source": "def:RVLaw", "target": "def:JoinRV", "type": "weak"}}, {"data": {"id": "def:MargProbdef:Marg", "source": "def:Marg", "target": "def:MargProb", "type": "strong", "visibility": 1}}, {"data": {"id": "def:MargRVdef:Marg", "source": "def:Marg", "target": "def:MargRV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:MargRVdef:MargProb", "source": "def:MargProb", "target": "def:MargRV", "type": "weak"}}, {"data": {"id": "def:MargRVdef:RVLaw", "source": "def:RVLaw", "target": "def:MargRV", "type": "weak"}}, {"data": {"id": "ex:MargGaussex:Gauss", "source": "ex:Gauss", "target": "ex:MargGauss", "type": "weak"}}, {"data": {"id": "th:Bayesdef:CondProb", "source": "def:CondProb", "target": "th:Bayes", "type": "strong", "visibility": 1}}, {"data": {"id": "def:CondCartrem:CondCart", "source": "rem:CondCart", "target": "def:CondCart", "type": "strong", "visibility": 1}}, {"data": {"id": "def:CondCartdef:CondProb", "source": "def:CondProb", "target": "def:CondCart", "type": "strong", "visibility": 1}}, {"data": {"id": "def:CondCartdef:MargProb", "source": "def:MargProb", "target": "def:CondCart", "type": "strong", "visibility": 1}}, {"data": {"id": "def:CondCartdef:JoinRV", "source": "def:JoinRV", "target": "def:CondCart", "type": "weak"}}, {"data": {"id": "def:CondCartdef:MargRV", "source": "def:MargRV", "target": "def:CondCart", "type": "weak"}}, {"data": {"id": "rem:CondCartdef:CondProb", "source": "def:CondProb", "target": "rem:CondCart", "type": "strong", "visibility": 1}}, {"data": {"id": "ex:DWRPMex:RemCom", "source": "ex:RemCom", "target": "ex:DWRPM", "type": "weak"}}, {"data": {"id": "th:JLIVdef:IndRV", "source": "def:IndRV", "target": "th:JLIV", "type": "strong", "visibility": 1}}, {"data": {"id": "th:JLIVdef:MargRV", "source": "def:MargRV", "target": "th:JLIV", "type": "strong", "visibility": 1}}, {"data": {"id": "th:JLIVdef:CondCart", "source": "def:CondCart", "target": "th:JLIV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:IndRVsubsec:RV", "source": "subsec:RV", "target": "def:IndRV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:IndRVdef:IndEv", "source": "def:IndEv", "target": "def:IndRV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:iidth:JLIV", "source": "th:JLIV", "target": "def:iid", "type": "strong", "visibility": 1}}, {"data": {"id": "ex:Binex:Bern", "source": "ex:Bern", "target": "ex:Bin", "type": "weak"}}, {"data": {"id": "ex:Binex:SumEq01", "source": "ex:SumEq01", "target": "ex:Bin", "type": "weak"}}, {"data": {"id": "def:IndExpdef:Expe", "source": "def:Expe", "target": "def:IndExp", "type": "strong", "visibility": 1}}, {"data": {"id": "def:IndExpdef:IndRV", "source": "def:IndRV", "target": "def:IndExp", "type": "strong", "visibility": 1}}, {"data": {"id": "th:IndCovdef:IndExp", "source": "def:IndExp", "target": "th:IndCov", "type": "strong", "visibility": 1}}, {"data": {"id": "th:IndCovdef:Cov", "source": "def:Cov", "target": "th:IndCov", "type": "strong", "visibility": 1}}, {"data": {"id": "def:StdPropdef:Std", "source": "def:Std", "target": "def:StdProp", "type": "strong", "visibility": 1}}, {"data": {"id": "def:StdPropth:IndCov", "source": "th:IndCov", "target": "def:StdProp", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Expedef:RV", "source": "def:RV", "target": "def:Expe", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Expedef:RVLaw", "source": "def:RVLaw", "target": "def:Expe", "type": "strong", "visibility": 1}}, {"data": {"id": "def:ScalRVdef:Expe", "source": "def:Expe", "target": "def:ScalRV", "type": "strong", "visibility": 1}}, {"data": {"id": "def:ScalRVdef:IndExp", "source": "def:IndExp", "target": "def:ScalRV", "type": "weak"}}, {"data": {"id": "def:Covdef:ScalRV", "source": "def:ScalRV", "target": "def:Cov", "type": "strong", "visibility": 1}}, {"data": {"id": "def:Stddef:Cov", "source": "def:Cov", "target": "def:Std", "type": "strong", "visibility": 1}}, {"data": {"id": "def:CovMdef:Cov", "source": "def:Cov", "target": "def:CovM", "type": "strong", "visibility": 1}}, {"data": {"id": "def:WLLNdef:iid", "source": "def:iid", "target": "def:WLLN", "type": "strong", "visibility": 1}}, {"data": {"id": "def:WLLNdef:EmpM", "source": "def:EmpM", "target": "def:WLLN", "type": "strong", "visibility": 1}}, {"data": {"id": "def:WLLNdef:StdProp", "source": "def:StdProp", "target": "def:WLLN", "type": "strong", "visibility": 1}}, {"data": {"id": "def:EmpMdef:Expe", "source": "def:Expe", "target": "def:EmpM", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:RVsubsec:ConfProb", "source": "subsec:ConfProb", "target": "subsec:RV", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:Margsubsec:ConfProb", "source": "subsec:ConfProb", "target": "subsec:Marg", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:Margdef:JoinRV", "source": "def:JoinRV", "target": "subsec:Marg", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:Margdef:RVLaw", "source": "def:RVLaw", "target": "subsec:Marg", "type": "weak"}}, {"data": {"id": "subsec:Condsubsec:ConfProb", "source": "subsec:ConfProb", "target": "subsec:Cond", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:Inddef:Prob", "source": "def:Prob", "target": "subsec:Ind", "type": "strong", "visibility": 1}}, {"data": {"id": "subsec:Inddef:CondProb", "source": "def:CondProb", "target": "subsec:Ind", "type": "weak"}}];