From: David Relson (email_suppressed_at_lugwash.org)
Date: Mon 05-Sep-2005 07:40:32 AM EDT
On Sun, 4 Sep 2005 22:26:58 -0400
J. Bruce Fields wrote:
> On Sun, Sep 04, 2005 at 07:56:05PM -0400, David Relson wrote:
> > Looking at August, the figures are actually:
> >
> > SS: 10345 99.56%
> > SU: 21 00.20%
> > SH: 19 00.18%
> > HS: 6 00.06%
> >
> > SH stands for Spam classified as Ham
> > SU for Spam classified as Unsure
> > SS for Spam classified as Spam
> > HS for Ham classified as Spam
> >
> > All the 19 SH were Spam delivered via mailing lists. Bogofilter learns
> > that mailing lists are "good", i.e. sources of ham. When spam is
> > delivered through a mailing list, bogofilter has trouble classifying it.
> >
> > Of the 6 HS, 5 were mailings from CompUSA and 1 was from Checks In The
> > Mail. At some point I noticed the CompUSA messages, searched my
> > archives, found a bunch classified as Spam and told bogofilter they're
> > really Ham. Bogofilter is now classifying CompUSA messages as Ham.
> >
> > HTH,
>
> Yes, thanks, that's interesting!
>
> It's a bit frustrating sometimes to just see the numbers reported as a
> single percentage without knowing how the errors are broken down. The
> "spam classified as ham" number is particularly important....
>
> --b.
"false negative" is the term for "spam classified as ham", while "false
positive" is used for "ham classified as spam". "false negatives" are
an annoyance while "false positives" are more serious -- as they could
be something important.
In my 3 yrs personal experience with bogofilter, there has only been 1
false positive (that I've noticed) that I cared about.
-- *** Sent from [e-mail suppressed] *** http://www.lugwash.org to unsubscribe: `echo "unsubscribe" | mail [e-mail suppressed]`
This archive was generated by hypermail 2.1.5 : Sat 01-Oct-2005 01:00:01 AM EDT