ntroduction: The naïve Bayesian classifier, or simple Bayesian classifier, works as follows:
P(Ci|X) > P(Cj|X) for 1 ≤ j ≤ m; j ≠ i:
Thus we maximize P(Ci|X). The class Ci for which P(Ci|X) is maximized is called the maximum posteriori hypothesis. By Bayes’ theorem (Equation (6.10)),
We can easily estimate the probabilities P(x1|Ci), P(x2|Ci), : : : , P(xn|Ci) from the training tuples. Recall that here xk refers to the value of attribute Ak for tuple X. For each attribute, we look at whether the attribute is categorical or continuous-valued. For instance, to compute P(X|Ci), we consider the following:
(a) If Ak is categorical, then P(xk|Ci) is the number of tuples of class Ci in D having the value xk for Ak, divided by |Ci,D|, the number of tuples of class Ci in D.
(b) If Ak is continuous-valued, then we need to do a bit more work, but the calculation is pretty straightforward. A continuous-valued attribute is typically assumed to have a Gaussian distribution with a mean μ and standard deviation s, defined by
These equations may appear daunting, but hold on We need to compute μCi and σCi , which are the mean (i.e., average) and standard deviation, respectively, of the values of attribute Ak for training tuples of class Ci. We then plug these two quantities into Equation (6.13), together with xk, in order to estimate P(xk|Ci). For example, let X = (35, $40,000), where A1 and A2 are the attributes age and income, respectively. Let the class label attribute be buys computer. The associated class label for X is yes (i.e., buys_computer = yes). Let’s suppose that age has not been discretized and therefore exists as a continuous-valued attribute. Suppose that from the training set, we find that customers in D who buy a computer are 38±12 years of age. In other words, for attribute age and this class, we have μ = 38 years and σ = 12.We can plug these quantities, along with x1 = 35 for our tuple X into Equation (6.13) in order to estimate P(age = 35|buys_computer = yes).
P(X|Ci)P(Ci) > P(X|Cj)P(Cj) for 1 ≤ j ≤ m, j ≠ i.
In other words, the predicted class label is the class Ci for which P(X|Ci)P(Ci) is the maximum.