Naive-Bayse Classifier


1. Bag-of-Words


"I really like studying artificial intelligence"


I = [1, 0, 0, 0, 0, 0]
really = [0, 1, 0, 0, 0, 0]
like = [0, 0, 1, 0, 0, 0]
studying = [0, 0, 0, 1, 0, 0]
artificial = [0, 0, 0,  0, 1, 0]
intelligence = [0, 0, 0, 0, 0, 1]

Bag-of-words vector 

"I really really like studying artificial intelligence" 
[1, 2, 1, 1, 1, 1]

2. Naive-Bayse Classifier

document d, class c

mapped calss C 
= argmax P(c | d) 
= argmax P(d | c) P(c) /  P(d)   *bayse rule
= argmax P(d | c) P(c)           *p(c) is constant, drop denomiator
= argmax P(c) P(w1, w2, ..., wn | c)

*by conditional independence assumption
= argmax P(c) {P(w1|c) P(w2|c) ... P(wn|c)}

*You can get P(c), P(d), P(w_i | c) from train dataset but also

*by maximum likelihood estimation, we can find parameters like P(c), P(d), P(w_i | c)
likelihood = p(mapped class C == prediction)
find P(c), P(d), P(w_i | c) which maximize total product of likelihoods



