Spam Filter 

How do you identify a given email as spam, without accidentally, wrongly rejecting good email?
One technique is to use Bayes' Theorem. Take a large number of known spam messages and a large number of known good messages. For all the words that occur, measure whether they are more common in spam, or more common in "ham". That gives a probability for each word to predict whether the email containing it is spam or ham.
So given a single email, look at the words in it and find those that are the best predictors. Bayes' Theorem then lets us use their probabilities to estimate the probability that the email is spam.
Another technique is to use Markov Chains. Not sure how that works, but I'm sure a Google search would find something.