Wednesday, November 30, 2005

PopFile Review

I wrote about my e-mail classification program, PopFile in my E-mail/Spam solution entry. Since there are few reviews of PopFile, I am publishing the results of six months of classification (June 1, 2005 to November 30). The results are pretty impressive. Drawn from a sample of over 46000 messages, the accuracy is over 99% and approaching 100%.
Read more
The messages were hand-classified and are from more than one e-mail account (including one Hotmail account), with the majority of non-spam being work-related messages of a software technical nature. There is a large sample of "bounce" messages which stem from spammer(s) forging the domain name to send messages, and various types of bounces (spam filters, address not found). That would be the subject of another article, but I did not keep statistics on that.

The "buckets" are pretty straight-forward, with "win" being Windows related, "net" being network/domain related. The "ad" bucket is a but of a misnomer as it refers to blatant adult related messages. Most of those types of messages are now classified as spam instead. The "spam-virus" can either be messages already processed by Norton Antivirus and the attachment removed, or obvious virus type messages with attachments that were not caught by the antivirus.

To calculate the spam hit rate I took out the false positives and negatives to come up with a score of .992, the ham strike rate is .003 (false negatives only). PopFile doesn't include the unclassified value in its calculation so it claimed 99.65% accurate.

Bucket Name Distinct Words Word Count Classification Count False Pos. False Neg.
ad 8,307 22,709 (11.82%) 99 (0.21%) 0 4
bounce 3,920 25,035 (13.03%) 36,592 (79.39%) 40 91
fax 88 460 (0.23%) 397 (0.86%) 1 0
in 14,078 54,053 (28.13%) 2,967 (6.43%) 13 18
net 381 1,182 (0.61%) 15 (0.03%) 0 0
spam 13,358 48,693 (25.34%) 4,280 (9.28%) 36 32
spam-virus 14,261 19,624 (10.21%) 1,062 (2.30%) 0 9
win 4,605 20,341 (10.58%) 464 (1.00%) 1 5
unclassified

210 (0.45%) 117

Total 58,998 192,256 (100.00%) 46,086 (100.00%) 208 159

Labels:

Wednesday, November 02, 2005

Remote control

For my TV, DVD and audio equipment, I use one remote control for everything. It doesn’t have a screen, or touch buttons, and it is over 5 years old, and cost $30. How is this possible? I give you, the Radio Shack 15-1994 6-in-1 Smart A/V Remote.

Of course, Radio Shack in Canada is now, “The Source, by Circuit City”, and this remote is no longer sold, but what it has is a feature called, JP1 . This allows me to program the remote from my computer. There is a whole group of people at yahoogroups who have set up software and spreadsheets to assist with this. I created a cable using their instructions and started tinkering.

My current routine when I get a new device (not that often, mind you) is to learn the features I need from the new remote, program it in, and never use the manufacturers remote again. I can program macros (a sequence of button presses), timers (e.g. for a commercial mute for 120 seconds) and “punch-throughs” which allow me to be in DVD mode and control the Audio system’s volume.
Read more
Current devices I control are:
Sony TV
Sony Stereo
JVC Stereo
RCA DVD
Sharp VCR

I try to "give back" by posting new devices I have programmed, so others may copy them and use in their own remotes.

There is a newer model, the 15-2117 I think, which is an 8-in-1 that I purchased for my dad when it was on sale for $40. It does have an LCD which helps while programming. That remote was set up purely using codes known by the remote, and the learning feature for codes it didn't (like the screen projector). His needs are more simpler, and the only macro is on that turns on the projector, audio, and satellite. (Pressing it again turns all of those devices off).