[K-MOOC]  Introductory Mathematics for Artificial Intelligence

         그림입니다.
원본 그림의 이름: cover-new-1.jpg
원본 그림의 크기: 가로 3585pixel, 세로 4985pixel

                Translated by

  Sang-Gu LEE with Youngju NIELSEN, Yoonmee HAM

         from the original Korean text written by

      Sang-Gu LEE with Jae Hwa LEE, Yoonmee HAM, Kyung-Eun PARK


    Part Ⅳ. AI and Statistics

10. Permutation, combination, probability, random variable, probability distribution,  Bayes' theorem

  10.1 Permutations and combinations

  10.2 Probability

  10.3 Conditional probability

  10.4 Bayes' theorem

  10.5 Random variable

  10.6 Discrete probability distributions

  10.7 Continuous probability distributions


10.1 Permutations and combinations

Two methods of counting are permutations and combinations. First, a permutation refers to the number of cases in which order is considered.

Let’s take a look at the following example. There are five cards with 1,2,3,4 and 5 written. You will choose three cards out of those five cards,

but you care about the order. The possible outcomes are as follows.


                               

             그림입니다.
원본 그림의 이름: CLP00002fb40003.bmp
원본 그림의 크기: 가로 412pixel, 세로 122pixel

      (One of five cards) × (one of four remaining cards) × (one of three remaining cards)


Here is the reason. You will have three cards arranged in order as above. The number that can appear on the first card ① is 1,2,3,4 and 5.

Then the number that can appear on the second card ② is one number out of four since one card already appears on the first card ①.

We excluded one number that already appeared on the first card ① from the possible numbers on the  the second card ②.

Likewise, only three numbers can appear on the 3rd  card ③ excluding the two numbers that appeared on the 1st cards ① and 2nd cards ②.

Now, you will multiply all possible number of cases and have .

We can express permutations this way. When you choose k out of n with the consideration of the order, (We call it permutations),

you can write    , and the formula is as follows.      

                           ()

Especially when , it is = . We read this as " factorial". 

  


사각형입니다.  You want to make a three-digit number by choosing three different numbers from 1 to 9. Find the number of natural numbers

that you can make.

Solution.                                               



A combination refers to the number of cases selected in any order. For example, the number of cases in which three cards are selected

from five cards with 1, 2, 3, 4, 5 written on it is as follows.

                                

Let’s imagine we have 1,2,3 appeared on three cards. If you consider the order, you will have (1,2,3), (1,3,2) ,(2,1,3),(2,3,1),(3,1,2) and (3,2,1).

There are six cases in permutations. In combinations, it is considered as all the same and counts as 1. Therefore, you should divide the number

of cases from permutations by 6 = 3!.

Now, let’s take a look at our calculation. The numerator is   and the denominator is 3!. You can write the numerator of as 5!/2!.

Remember 5! is   and 2! is . So if you divide 5! by 2!, you will leave with . So we can rewrite our combination calculation  .


In general. when we choose out of without considering  the order, we write the total number as which is called a combination

and the formula is as follows.

                     ()


사각형입니다.  What is the number of ways to choose 5 ties from 500 ties?       .



You can also calculate it using the Sage built in command as follows.



So far, we have introduced a case without repetition. If a repetition is allowed, we can calculate permutations and combinations

with repetition as follows.

(1) Permutations with repetitions(repeated permutation)

 When you choose k out of n in order, it is permutations. When you do permutations with repetitions, you can calculate using the following formula.

                              

(2) Combinations with repetitions(repeated combination)

When you choose out of without order, it is combinations. When you do combinations with repetitions, you can calculate using the following formula.

                            


사각형입니다.  Select three of the numbers 1, 2, 3, 4, and 5, and arrange them in a row to make three-digit natural numbers with repetitions.

Find the number of cases where the three-digit natural number is a multiple of 5.

Solution.  If we fix the one digit of the three-digit natural number □□□ as 5, the rest you need to do is to have the first two-digit numbers

among 1, 2, 3, 4, and 5 with repetitions. In other words, it is the same as choosing two numbers out of five numbers with repetitions ( and ):

                                                  


사각형입니다.  When four people vote anonymously for one of A, B, or C, find the number of possible outcomes.

Solution.  Since anonymous four votes are AAAA, AAAB, ..., BCCC, and CCCC, it is the same as the number of combinations with repetitions

in which you choose 3 out of 4 with repetitions ( and ). In other words, it is
                    .                   


10.2 Probability

The probability of a specific event will occur is anywhere between 0 and 1. For example, when you flip a coin, the probability of getting a head is 1/2.

It is because you will always have a head or a tail. The possible cases are 2 and having a head is 1 case out of 2 cases. So, the probability of 0

means that the event can never happen, and 1 means that the event must happen.

To analyze the probability of an event mathematically, we must know all kinds of possible events. For example, possible events can be represented as {head, tail}

in the case of a coin toss, and it can be represented as {1, 2, 3, 4, 5, 6} in the case of a cubed dice. The set of events is called a sample space.


Now let's formally define the probability. If the number of all the possible occurrences is and the number of occurrences of a specific event is ,

the probability of occurrence of an event is as follows.

 = the number of event/the number of all possible event

                 

In geometry. when   is the entire area and , then the probability can be expressed as follows.

                             

In actual experiments. we will be able to guess the probability after we do many times of trials.  When a certain trial is repeated times and the number of times

a specific event occurs is times, the probability of this event is . This is a relative frequency. It is also called the statistical probability of the event.

The relative frequency approaches a certain  value of probability . This is called 'mathematical probability'. As the number of trials becomes sufficiently large,

the statistical probability becomes equal to the mathematical probability. We call this property ‘Law of large number’,  .


The probability satisfies the following properties:

Let be the probability of an event .


  ① for arbitrary event in the sample space .

  ② For , .

  ③ for the empty set.

  ④ If the two events A and B do not occur at the same time,

                           

  ⑤ If the is the event that does not occur, then


사각형입니다. Suppose that we have 3 black balls, 2 white balls, and 1 red ball in a pocket. What is the probability of choosing two balls

of the same color when we take out two balls at the same time from the pocket?

Solution. First, find the total number of cases.

The number of cases that we take  two out of the six balls is . Then count the number of cases with the same color. There are only two cases

of choosing two black balls and two white balls. Choose 2 out of 3 and choose 2 out of 2. And add them to have 4 cases. .

So the probability  is .



사각형입니다.  Flip a coin 10 times using the R command below, and find the number of 'Tail' and the number of 'Head'.

Let's practice to flip the same coin 100 times in a similar way.

Solution.  We may use <R code> now. We choose R instead of Python or Sage in the lower right corner of the Sage cell. 

Just click the ‘Evaluate’ button in http://matrix.skku.ac.kr/KOFAC/ after copy and paste the following R-code. Then we will see the output.



If we use a new line of code, ‘table(coin)’ in the R environment instead of ‘coin’, we will have a simple table with the number of events for each case.



http://matrix.skku.ac.kr/2018-album/R-Sage-Stat-Lab-2.html 


Suppose the number of trials is sufficiently large. In that case, we will see that the probability of each event converges to  ,

which was the mathematical probability as the Law of large numbers told.


사각형입니다. There are 3 defective products out of 1000 products. When 10 of these products are purchased, Find the following two probabilities.

(1) There is no defective product among the purchased products.

(2) There is at least one defective product among the purchased products.

Solution. The number of cases where 10 products are selected out of 1000 products is . If there is no defective product,

then all 10 items should be selected from 997 normal products, and none of the three defectives are selected. So the number of cases is .

Calculation using the Sage code as follows.



If we want to use R-code. refer to the below.



(2) If there will be at least one defective item, we just subtract the above probability (with no defective products) from 1.

 will be the probability of having at least one defective product. If we check that with the 'Sage Code', it is given as follows.



We can use R code as follows.



10.3 Conditional probability

Conditional probability is an essential  concept in data analysis, which is the probability that an  event occurs under the condition that an  event occurred.

This conditional Probability is expressed as

    (where )


      그림입니다.
원본 그림의 이름: 조건부확률.png
원본 그림의 크기: 가로 254pixel, 세로 198pixel

          [Source]  https://blog.naver.com/alwaysneoi/100148922781


We have the following rule from the definition of the conditional probability and product rule. 

            

We can generalize it for multiple events ,

    

       

       


사각형입니다. For two events , with , , Find .

Solution.  We can use the formula to have the following conditional probability.

                                        

10.4 Bayes' theorem   

Bayes' theorem describes the probability of an event based on prior knowledge of conditions related to the event.

It is crucial when dealing with decision-making problems mathematically under uncertainty. It is instrumental when calculating the value of

invisible and intangible assets, such as information.


First, we will start by introducing some frequently used terms. A < prior probability> of an event is the probability of the event computed

before the collection of new data. is the prior probability for . And a <posterior probability> is the revised or updated probability

of an event occurring after taking into consideration new information. Conditional probability is used for explaining the situation in which a specific event

occurred but no one knows why the event occurred. In , represents an event that has already occurred, and is considered as

the posterior probability, meaning that the probability of the event after the

occurrence of is observed.

The Bayesian Theorem is a theory that extracts posterior probabilities using data obtained from prior probabilities and events.

It computes the relationship between prior probabilities and posterior probabilities using conditional probabilities. The following diagram explains it well.

When this sample space has a partition . This means and .

Then the following holds for any event .


                       그림입니다.
원본 그림의 이름: CLP000047dc0001.bmp
원본 그림의 크기: 가로 601pixel, 세로 416pixel


where is mutually exclusive. And

             

So is equal to the sum of probabilities of each . Then from the Multiplication theorem on probability,

we can get the <Law of Total Probability>, which means the following formula, :

        

 Now, if we use the definition of conditional probability and <the law of total probability>,

we can get the following formula. We call it the Bayes' theorem.

 

         


In Bayes' theorem, is called the prior probability of an event and is called the posterior probability of an event .


사각형입니다. Suppose that three machines produce the of products in this factory respectively.

And the defective rates of machines  are , respectively. Find the probability that the purchased product is defective

when we randomly select one product. Find the probability that this purchased defective product was produced by the machine .

Solution.  Suppose that an event in which the purchased product is produced by a machine is denoted by , and

suppose that an event for a defective product was produced is denoted by . Then we have the following relations.
           , .
The event with a defective product:
The probabilities of having  a defective product :
            
Using 'the law of total probability', the probability of defective products is obtained as follows.

         

                   

According to Bayes' theorem, the probability is 4/33.

        

                               


◩ Open Problem 1 

Discuss the Monty Hall problem of conditional probabilities. It is one of the examples in which Bayes' theorem can be applied.

https://destrudo.tistory.com/5 


 [Lecture]   Permutation, combination, probability: https://youtu.be/KQXO-XbJauU

 [Lecture]   Bayes’ theorem: https://youtu.be/VAGLigLt2Hw

                     

10.5. Random variables

In an experiment of flipping two coins, there are four possible outcomes:


{Head, Head}, {Head, Tail}, {Tail, Head}, {Tail, Tail}


The probability of each outcome is . If we set the number of tails in 2 tosses as , then each outcome corresponds to the 0, 1, and 2. 

For example, we can match with (Head, Head) to 0, (Head, Tail) and (Tail, Head) to 1, and (Tail, Tail) to 2, as shown in the following figure.

X is called a random variable.


          그림입니다. 


A random variable is like a variable in computer programming. It is a variable whose value is determined probabilistically.

So it is sometimes called a ‘random variable’ or simply a ‘variable'.

A random variable matches all samples (or you can also think of all possible events) in sample space with a real value.

Therefore, if a random variable is used instead of a probabilistic event, then it enables us to compute and analyze random events using numbers.

Random variables are written in uppercase letters, and lowercase letters are used for each value that the variable can take.


10.6 Discrete probability distributions

A discrete variable is a variable that can only take a countable number of values . And a discrete (probability) distribution describes

the probability of occurrence of each value of a discrete random variable .

It can be expressed as a table as follows.


Sum

Probability

1


For example, in an experiment of tossing two coins simultaneously, the probability distribution of the random variable defined by the number of tails

is shown in the figure below.


     그림입니다. 


The function that matches the probability when the discrete

random variable takes a value is called

'the probability mass function(pmf)' of the random variable.

                   

The probability mass function satisfies three properties as follows.


  ①   

  ②

  ③ For discrete random variable , the probability of is

     


10.7 Continuous probability distributions

A continuous random variable is a random variable where the data can take infinitely many values. In the case of a continuous random variable,

the probability of taking a specific value is always 0. It makes no sense to use 'the probability mass function'

to represent the probability distribution. Instead, we define 'a probability density function(pdf)' to represent the continuous probability distribution.


For a continuous random variable , if the function satisfies the following properties, then is called the probability density function of .

We can consider it as the concept that corresponds to the probability mass function of a discrete probability distribution.

  ①  For all real ,

  ②

  ③

                  

Look at the graph of a function below. We can see that the probability of  between and is following graph's shaded area.

It does not matter whether it includes the endpoints and .


                그림입니다.


☞ Note  How did the word “density” come to be used here? If we think of the probability as mass and the length of the interval as volume,

then [probability/length of the interval] is [Mass/volume]. [Mass/volume] is a ‘density’. from the simple relationship of

  [probability/length of the interval] * [length of the interval] = [probability],

it becomes a density multiply the length of the interval equals probability. As a result, the term 'probability density function' is created.


 [Lecture on] Discrete probability distributions https://youtu.be/Fq7D7bGG_cE

 [Lecture on] Continuous probability distributions https://youtu.be/4wx1raETI8o

 [Lecture on] Integral https://youtu.be/62OxYG7VMsE  


Copyright @ 2021 SKKU Matrix Lab. All rights reserved.
Made by Manager: Prof. Sang-Gu Lee and Dr. Jae Hwa Lee