Click the comments link on any story to see comments or add your own.
Subscribe to this blog
26 Dec 2011
An interesting new paper from the Naval Postgraduate School (paper here, conference slides here) describes what appears to be an interesting new twist on spam filtering, looking at the characteristics of the TCP session through which the mail is delivered.
They observe that bots typically live on cable or DSL connections with slow congested upstreams. TCP sessions from bots turn out to be fairly easy to recognize by RTT, window, and retransmits, something that people have known at least since a paper at the 2008 CEAS conference on the topic.
This paper tries to see whether it would be practical to use that info to manage spam in real time. They have a network analyzer called SpamFlow that figures out per-connection characteristics. Then as a proof of concept they wrote a Spamassassin plugin to train on the data from SpamFlow and try and do filtering. They do some sort of hand-wavey load testing to see whether SpamFlow can keep up with a realistic mail load, and if it trains fast enough that it would provide useful data in real time. They claim that their results show that it does both.
It's not obvious how best you would use this in combination with all of the other anti-spam tools people we have, most notably blacklists like the CBL that very accurately identify IPs of botted hosts by looking at the characteristics of mail received at large spamtraps. One thing that occurs to me is this sort of thing might be useful if mail moves to IPv6, since building v6 blacklists will be hard due to the size of the address space, while this lets you estimate the bottiness of each connection directly. Also, rather than accepting or rejecting mail, you might slow down mail reception from hosts that seem to be bots, both to give preference to non-bot senders, and because bots tend to be impatient so if you slow down a dubious connection and it gives up, it was probably a bot. The Turntide appliance did something similar five years ago, although it used different heuristics for deciding what to slow down.
This technique looks only at the characteristics of the TCP session, and not at the contents of the session, which means it also doesn't look at the contents of the messages. It might be useful in contexts where for legal or political reasons the spam filter isn't allowed to look at the messages, but users want spam filtering anyway. The authors point out that it is in principle applicable to any TCP transaction, so it might be useful against web queries from bots, too.
It's hardly a FUSSP, but it's an interesting paper.
My other sites
© 2005-2018 John R. Levine.
CAN SPAM address harvesting notice: the operator of this website will not give, sell, or otherwise transfer addresses maintained by this website to any other party for the purposes of initiating, or enabling others to initiate, electronic mail messages.