Probability theory basic concepts. Fundamentals of probability theory and mathematical statistics. Find the probability yourself and then look at the solution

Mathematics for Programmers: Probability Theory

Ivan Kamyshan

Some programmers, after working in the field of developing regular commercial applications, think about mastering machine learning and becoming a data analyst. They often don't understand why certain methods work, and most machine learning methods seem like magic. In fact, machine learning is based on mathematical statistics, and that, in turn, is based on probability theory. Therefore, in this article we will pay attention to the basic concepts of probability theory: we will touch on the definitions of probability, distribution and analyze several simple examples.

You may know that probability theory is conventionally divided into 2 parts. Discrete probability theory studies phenomena that can be described by a distribution with a finite (or countable) number of possible behavior options (throwing dice, coins). Continuous probability theory studies phenomena distributed over some dense set, for example, on a segment or in a circle.

One can consider the subject of probability theory on simple example. Imagine yourself as a shooter developer. An integral part of the development of games in this genre is the shooting mechanics. It is clear that a shooter in which all weapons shoot absolutely accurately will be of little interest to players. Therefore, it is imperative to add spread to your weapon. But simply randomizing weapon impact points will not allow for fine tuning, so adjusting the game balance will be difficult. At the same time, using random variables and their distributions can analyze how a weapon will perform with a given spread and help make the necessary adjustments.

Space of elementary outcomes

Let's say that from some random experiment that we can repeat many times (for example, tossing a coin), we can extract some formalized information (it came up heads or tails). This information is called an elementary outcome, and it is useful to consider the set of all elementary outcomes, often denoted by the letter Ω (Omega).

The structure of this space depends entirely on the nature of the experiment. For example, if we consider shooting at a sufficiently large circular target, the space of elementary outcomes will be a circle, for convenience, placed with the center at zero, and the outcome will be a point in this circle.

In addition, sets of elementary outcomes - events are considered (for example, hitting the top ten is a concentric circle of small radius with a target). In the discrete case, everything is quite simple: we can get any event, including or excluding elementary outcomes in a finite time. In the continuous case, everything is much more complicated: we need some fairly good family of sets to consider, called algebra by analogy with simple real numbers that can be added, subtracted, divided and multiplied. Sets in algebra can be intersected and combined, and the result of the operation will be in the algebra. This is a very important property for the mathematics that lies behind all these concepts. A minimal family consists of only two sets - the empty set and the space of elementary outcomes.

Measure and probability

Probability is a way of making inferences about the behavior of very complex objects without understanding how they work. Thus, probability is defined as a function of an event (from that very good family of sets) that returns a number - some characteristic of how often such an event can occur in reality. To be certain, mathematicians agreed that this number should lie between zero and one. In addition, this function has requirements: probability impossible event zero, the probability of the entire set of outcomes is unit, and the probability of combining two independent events (disjoint sets) is equal to the sum of the probabilities. Another name for probability is a probability measure. Most often, Lebesgue measure is used, which generalizes the concepts of length, area, volume to any dimensions (n-dimensional volume), and thus it is applicable to a wide class of sets.

Together, the collection of a set of elementary outcomes, a family of sets, and a probability measure is called probability space. Let's consider how we can construct a probability space for the example of shooting at a target.

Consider shooting at a large round target of radius R, which is impossible to miss. By a set of elementary events we set a circle with a center at the origin of coordinates of radius R. Since we are going to use area (the Lebesgue measure for two-dimensional sets) to describe the probability of an event, we will use a family of measurable (for which this measure exists) sets.

Note Actually, this is a technical point and simple tasks the process of determining a measure and a family of sets does not play a special role. But it is necessary to understand that these two objects exist, because in many books on probability theory the theorems begin with the words: “ Let (Ω,Σ,P) be a probability space...».

As mentioned above, the probability of the entire space of elementary outcomes must be equal to one. The area (two-dimensional Lebesgue measure, which we denote λ 2 (A), where A is an event) of a circle, according to a well-known formula from school, is equal to π *R 2. Then we can introduce the probability P(A) = λ 2 (A) / (π *R 2), and this value will already lie between 0 and 1 for any event A.

If we assume that hitting any point on the target is equally probable, the search for the probability of a shooter hitting some area of ​​the target comes down to finding the area of ​​this set (from here we can conclude that the probability of hitting a specific point is zero, because the area of ​​the point is zero).

For example, we want to find out what is the probability that the shooter will hit the top ten (event A - the shooter hits the desired set). In our model, the “ten” is represented by a circle with a center at zero and radius r. Then the probability of getting into this circle is P(A) = λ 2 /(A)π *R 2 = π * r 2 /(π R 2)= (r/R) 2.

This is one of the simplest types of "geometric probability" problems - most of these problems require finding an area.

Random variables

A random variable is a function that converts elementary outcomes into real numbers. For example, in the problem considered, we can introduce a random variable ρ(ω) - the distance from the point of impact to the center of the target. The simplicity of our model allows us to explicitly define the space of elementary outcomes: Ω = (ω = (x,y) such numbers that x 2 +y 2 ≤ R 2 ) . Then the random variable ρ(ω) = ρ(x,y) = x 2 +y 2 .

Means of abstraction from probabilistic space. Distribution function and density

It’s good when the structure of the space is well known, but in reality this is not always the case. Even if the structure of a space is known, it can be complex. To describe random variables if their expression is unknown, there is the concept of a distribution function, which is denoted by F ξ (x) = P(ξ< x) (нижний индекс ξ здесь означает случайную величину). Т.е. это вероятность множества всех таких элементарных исходов, для которых значение random variableξ at this event is less than the given parameter x.

The distribution function has several properties:

  1. Firstly, it is between 0 and 1.
  2. Secondly, it does not decrease when its argument x increases.
  3. Third, when the number -x is very large, the distribution function is close to 0, and when x itself is large, the distribution function is close to 1.

Probably, the meaning of this construction is not very clear upon first reading. One useful property is that the distribution function allows you to look for the probability that a value takes a value from an interval. So, P (the random variable ξ takes values ​​from the interval) = F ξ (b)-F ξ (a). Based on this equality, we can study how this value changes if the boundaries a and b of the interval are close.

Let d = b-a , then b = a+d . And therefore, F ξ (b) - F ξ (a) = F ξ (a+d) - F ξ (a) . For small values ​​of d, the above difference is also small (if the distribution is continuous). It makes sense to consider the ratio p ξ (a,d)= (F ξ (a+d) - F ξ (a))/d. If, for sufficiently small values ​​of d, this ratio differs little from some constant p ξ (a), independent of d, then at this point the random variable has a density equal to p ξ (a).

Note Readers who have previously encountered the concept of derivative may notice that p ξ (a) is the derivative of the function F ξ (x) at point a. In any case, you can study the concept of a derivative in an article on this topic on the Mathprofi website.

Now the meaning of the distribution function can be defined as follows: its derivative (density p ξ, which we defined above) at point a describes how often a random variable will fall into a small interval centered at point a (the neighborhood of point a) compared to the neighborhoods of other points . In other words, the faster the distribution function grows, the more likely it is that such a value will appear in a random experiment.

Let's go back to the example. We can calculate the distribution function for the random variable, ρ(ω) = ρ(x,y) = x 2 +y 2 , which denotes the distance from the center to the random hit point on the target. By definition, F ρ (t) = P(ρ(x,y)< t) . т.е. множество {ρ(x,y) < t)} – состоит из таких точек (x,y) , расстояние от которых до нуля меньше, чем t . Мы уже считали вероятность такого события, когда вычисляли вероятность попадания в «десятку» - она равна t 2 /R 2 . Таким образом, Fρ(t) = P(ρ(x,y) < t) = t 2 /R 2 , для 0

We can find the density p ρ of this random variable. Let us immediately note that outside the interval it is zero, because the distribution function over this interval is unchanged. At the ends of this interval the density is not determined. Inside the interval, it can be found using a table of derivatives (for example, from the Mathprofi website) and elementary rules of differentiation. The derivative of t 2 /R 2 is equal to 2t/R 2. This means that we found the density on the entire axis of real numbers.

Another useful property of density is the probability that a function takes a value from an interval, which is calculated using the integral of density over this interval (you can find out what this is in articles about proper, improper, and indefinite integrals on the Mathprofi website).

On first reading, the integral over an interval of the function f(x) can be thought of as the area of ​​a curved trapezoid. Its sides are a fragment of the Ox axis, a gap (horizontal coordinate axis), vertical segments connecting points (a,f(a)), (b,f(b)) on the curve with points (a,0), (b,0 ) on the Ox axis. The last side is a fragment of the graph of the function f from (a,f(a)) to (b,f(b)) . We can talk about the integral over the interval (-∞; b], when for sufficiently large negative values, a, the value of the integral over the interval will change negligibly compared to the change in the number a. The integral over intervals is defined in a similar way)