Spam Filter

	AllPages RecentChanges Links to this page Edit this page Search Entry portal
Advice For New Users

A practical modern application of some areas of mathematics is Spam Filtering.

How do you identify a given email as spam, without accidentally, wrongly rejecting good email?

One technique is to use Bayes' Theorem. Take a large number of known spam messages and a large number of known good messages. For all the words that occur, measure whether they are more common in spam, or more common in "ham". That gives a probability for each word to predict whether the email containing it is spam or ham.

So given a single email, look at the words in it and find those that are the best predictors. Bayes' Theorem then lets us use their probabilities to estimate the probability that the email is spam.

Another technique is to use Markov Chains. Not sure how that works, but I'm sure a Google search would find something.

Links to this page / Page history / Last change to this page
Recent changes / Edit this page (with sufficient authority)
All pages / Search / Change password / Logout