The Movable Type and Professional Network Wiki has been moved to wiki.movabletype.org.
SpamLookupRecipes
SpamLookup Recipes
SpamLookup is a free plugin that comes with Movable Type which blocks spam for you. You can add new rules to its list to help it learn over time. Here are some recipes people have used that have yielded good results for users to help tune SpamLookup to better detect spam.
(For more details, see Neil Turner's guide to SpamLookup.)
"Keywords to Junk"
Courtesy Everitz Consulting
/\d{4,5}@\d{4,5}\.br/i
/(cool|excellent|good|nice)(\s)+site\.(\s)+(thank(\s)+you|thanks)/i
"Celebrity News" Spammers
There are spammers who are uploading files to wikis, then spamming blogs with the links to these (presumably malware-infested) files. No good! Try this:
/pukiwiki\S+(pcmd|file)=/i 10
(Thanks to Jay Allen for the suggestion.)
Cut and Paste these into "Keywords to junk"
/ringtones?/
nokia6630
/excellent site/
wow gold
/\d{4,5}@\d{4,5}\.br/i
/online-?(casino|poker|gambling)/i
/texas\S*hold\S*em/i
/*_11@*\.*/i
There are certain words that junkers frequently use in their URLs that real people don't. By appending and prepending a dash, you can check for these without forbidding the word entirely. Some examples that I use
-buy
buy-
-cheap
cheap-
Forbid links with a '|' character afterwards.
/<a[^>]+>[^<]+<\/a>\s*[|]/
Forbid a raw URL followed by a '|'
/http://[[:alnum:]./_-]\s*[|]/
Forbid three or more links in a row with only white space between them (remove the "\" around the "{" character).
/(?:<a[^>]+>[^<]+</a>\s*(?:<br ?/?>)?\s*)\{3\}/i
This forbids have a link with the URL repeated immediately after the link. E.g., <a href="http://junk.source.com">buy my stuff!</a>http://junk.source.com
/<a\s+href=['"]([^'"]+)['"]\s*>[^<]+<\/a>\s*\2/i
This forbids phrases of the form "buy *** online".
/buy \w+ online/i
I allow blogspot URLs because I filter out the junkers who use it with these checks (remove the "\" around the "{" character):
/-\w+-[a-z0-9]\{4\}\.blogspot/ 2
/-.\.blogspot.com/ 2
Extended examples
These examples only work with the extended version of SpamLookup.
Bulletin board style links are usually a good sign it's junk. It's better to check the ending tag which doesn't have the URL in it.
[/url]
Forbid header tags.
<h1>
The same words used above for being in URLs can also be indicators if they're the first word of the link text.
>buy
>cheap
>online
I get frequent junk trackbacks with either no text or just some words in lower case with no punctuation. This catches those.
/^[[:space:][:lower:]]*$/ (text)
A more recent variant of this is just a single word of mixed letter and numbers, sometimes with a leading space. Note that in contrast to the previous one, no spaces are in the repeating pattern. This restricts it to a single word as the entire content.
/^\s*[[:lower:][:upper:][:digit:]]+$/ (text)
Purely numeric email addresses (either the name or the domain) are bad.
/^[[:digit:]]+@/ (email) 3
/@[[:digit:]]+\./ (email) 3
Check if the text is just a sequence of links with only white space and commas.
/^(?:<a[^>]+>[^<]+</a>\s*(?:<br ?/?>)?[\s,]*)+$/i (text) 2
URLs with a numeric domain are a good indication of junk. This does not match on IP style URLs.
/http://[[:digit:]]+\.[[:alpha:]]/ 2
One junker puts a "cs" at the front of his junk. That makes it easy to stop, we can just look for "cs" followed by an upper case letter at the beginning of the text.
/^cs[[:upper:]]/ (text)
Extended Blacklisting
To black list a commentor name, email address, or URL, simply create a line in the "Keywords to Filter" setting with that name/address/URL followed by "(name)", "(email)", "(url)" respectively. For instance, to blacklist the email address "neo@hotmail.com" it would be
neo@hotmail.com (email)
You can also ban specific words in the home page or commentor name. To ban comments with an alleged home page of google, but not affect references to Google in the comment itself, use
google (url)
If you don't want the word "vacation" to be in a commentor's name, use
vacation (name)
If you don't like email addresses like "reply@some.domain.com", use
reply@ (email)
Extended Whitelisting
Whitelisting is simple, just provide a negative weight for the rule. If I wanted to white list "Byrne Reese" as a commentor name, it would be
Byrne Reese (name) -10
White list comments from people with email addresses at 6A
@sixapart.com (email) -10
White list a URL for a comment or trackback
www.di2.nu (url) -10

