Bayesian filtering question.

Protecting your Linux box
Post Reply
azfar
Captain
Posts: 598
Joined: Tue Mar 23, 2004 1:16 am
Location: Karachi
Contact:

Bayesian filtering question.

Post by azfar »

I am trying to find out how exactly bayesian filtering works but I am not clear on it.

My question is how bayesian filtering actualy works i-e it collects the words from whole message (header, body & subject) or just body & subject or just from body.

In my scenario users forward me those spam messages (ham) which are incorrectly marked as spam and those ham messages (spam) which are actualy spam.

My problem in this way is that orignal messages are now modified and extra headers and signatures are also added with them and I am curious that this will train the filter in wrong manner.
Azfar Hashmi
Email : azfarhashmi@hotmail.com
lambda
Major General
Posts: 3452
Joined: Tue May 27, 2003 7:04 pm
Location: Lahore
Contact:

Post by lambda »

I am trying to find out how exactly bayesian filtering works but I am not clear on it.
read

http://www.paulgraham.com/spam.html
http://www.paulgraham.com/better.html
In my scenario users forward me those spam messages (ham) which are incorrectly marked as spam and those ham messages (spam) which are actualy spam.
a better way to solve this problem, if your users use imap, is to create two folders for each user called "for-spam" or "for-ham"; have them copy the incorrectly marked messages into the appropriate folder. then, every night, run a script that runs your filter against their messages.
My problem in this way is that orignal messages are now modified and extra headers and signatures are also added with them and I am curious that this will train the filter in wrong manner.
it's not something you need to worry about, if you have lots of spam/ham messages.
Watch out for the Manners Taliban!
Isn't it amazing how so many people can type "linuxpakistan.net" into their browsers but not "google.com"?
azfar
Captain
Posts: 598
Joined: Tue Mar 23, 2004 1:16 am
Location: Karachi
Contact:

Post by azfar »

thanks for the information.

Here what i got and its very simple.

Code: Select all

What Bayesian Spam Filters Look At?
When doing their scans and evaluations, Bayesian spam filters look at many parts of an email. Here is what they examine:

Words in the body of the message 
Headers of the message (including the senders and message paths) 
Aspects of the HTML code (such as the colors, for example) 
Word pairs and phrases (ones that are commonly used by spammers are searched for) 
Meta information (where a specific phrase appears, for instance) 
When an email arrives, it is scanned by the Bayesian spam filter. All of these characteristics are looked at, and the probability of the message being spam is calculated
Azfar Hashmi
Email : azfarhashmi@hotmail.com
Post Reply