Thursday, 16 April 2020

Naive Bayes Classifier

Hi there!

Welcome to my blog. Today I would like to talk about Naive Bayes Classifier.

What is a classifier?


Let us understand this with an example.
The above image is a dataset that has been recorded by a computer store owner to check if the next customer that comes in will buy a computer or not. 

Here, Age attribute has values youth , middle-aged and senior denoted by [y, m , s] respectively. Income attribute has values low , medium and high denoted by [l, m, h] respectively. Students attribute has values yes and no that is whether he is a student or not denoted by [y, n ] respectively.

This Dataset is quite small, in actual scenarios, the dataset needs to be large for our model to arrive at a global minimum or in simple words,
to be better at predicting.

A classifier will be trained on this data using various algorithms such as
Naive Bayes. This classifier will be able to classify the target variable (buys/not buys computer) as yes or no.

What Technologies Do I Need to Know? 


  1. Basic Knowledge of probability.
  2. Basic knowledge of Python including libraries mainly Pandas.


What is Bayes' Theorem?


Bayes' theorem tends to describe the probability of an event based on prior knowledge. In our case, prior knowledge would be the attributes of a person who went to the store to buy a computer. These attributes are Income, Age, Students(Is he a student?).
Now let us understand this Mathematically.

H--> Our hypothesis which is a customer will buy a computer or not

X--> Given some attributes about the customer(say Income and Age)

P(H/X)--> Probability (given that Customer with attributes X), will buy a computer

P(X/H)--> Probability (given that Customer will buy a computer or not), he has a certain Income and Age 
or in other words value of X

P(X)--> Probability that a person is of X value.
Example: Probability that a person from our set of customers is 35 years old and earns 40000

P(H)--> Probability that a person will buy a computer regardless of his attributes

Now, Bayes theorem becomes : P(H/X) = ( P ( X / H ) * P ( H ) ) / P ( X )
Therefore our aim here is to find the probability that a customer will a computer or not given his attributes as prior information.

Naive Bayes 


P(H/X) = P ( X / H ) * P ( H )
How many classes do we have?

Classes are yes or no that is whether he buys a computer or not.
The result is max(P(Ci)/X) here Ci is either yes or no.

That is we will take out the probability of Yes and probability of No
and consider the one with the maximum probability as our prediction.

P(Ci/X) = P( X / Ci ) * P ( Ci ) here, we won't take P(X) because P(X) is a constant. 
So, our goal here is to maximize P( X / Ci ) * P ( Ci ).<---( 1 )

Example


Now, let us understand it better with an example : 
Consider a new Data Came in for Prediction, where
 income=medium, student =yes, age= youth.
Now, let us understand this algorithm in steps.

Step 1

 

CALCULATING P(Ci)

Probability(event)= 
number of times event occured / total number of outcomes 

 P(buys_computer=yes)--> 9/14  

 P(buys_computer=no) --> 5/14

Step 2


CALCULATING P(X/Ci) 

P( income = medium / buys_computer = yes ) --> 4/9
P( income=medium / buys_computer=no ) --> 2/5 
P( student=yes / buys_computer = yes ) --> 4/9
P( student = yes / buys_computer = no ) --> 3/5
P( age=youth / buys_computer=yes ) --> 2/9
P( age = youth / buys_computer = no ) --> 3/5
P(X/yes)= (4/9) * (4/9) * (2/9) --> 0.0438
P(X/no)= (2/5) * (3/5) * (3/5) --> 0.144

Explanation


For P(income=medium/buys_computer=yes) we check for the number of yes and corresponding to those yes we will count the number of times income was medium which was counted to be 4 in our case. Then, we will divide that number with the total number of yes which are 9.

Therefore, 4/9 is achieved.

Why are we calculating probability just for income=medium?

Income indeed has other values that it can take. But in our new test example, probability of yes and no is asked for income= medium.

Similarly, the probability is calculated for other attributes.

After calculating all these probabilities, 

We will multiply all the probabilities which were corresponding to YES.

Similarly, all probabilities corresponding to NO are also multiplied.

Therefore, with this step, we calculated P( X / yes ) and P( X / no )
or in other words P( X / Ci ).

Step 3


CALCULATING P(Ci/X)

From (1) we get , 
P( yes / X )= 0.0438*(9/14) => 0.0281 
P( no / X )= 0.144*(5/14) => 0.0514 <--- WE HAVE A WINNER

THEREFORE, HE NOT WILL BUY THE COMPUTER.

Please refer to the below mentioned code for further understanding.