|
I created the following code when working on an anti-spam program code named
"Musubi" (which comes from the semi-food product known as Spam Musubi).
Through experience, like receiving over 300 spam messages per day, I've learned
that some spammers like to include web addresses like http://10.1.1.1/
in their spam messages to me. Actually, each numeric value can be any number up
to 255, so I need to use regular expressions to find text like this within my
email messages.
For this to work, what I do in my program is extract my email message and any
attachments, and make one big String of them. Then, I parse that
big String use regular expressions, as shown below. One important
thing that I have to do is use the MULTILINE version of the Pattern
class, because my big String has a bunch of newline (\n)
characters in it.
(Note that I intentionally embed newline characters in the spamString
that I create in the program. Because of these characters, the program will not
work unless you create the Pattern with the MULTILINE
attribute.)
One other thing to know before working with this program: I require that a
parameter be passed in on the command line. That parameter is the regular
expression that you want to use to try to match the spamString that
is in the program. So, for my tests, I used this simple regular expression:
.*\d+\.\d+\.\d+\.\d+.*
This pattern looks for
- any text (including nothing),
- followed by one or more decimal characters,
- followed by a period,
- followed by one or more decimal characters,
- followed by a period,
- followed by one or more decimal characters,
- followed by a period,
- followed by one or more decimal characters,
- followed by a period,
- followed by an text (including nothing).
In the program, if I find this pattern I print "got a match",
otherwise I print "no match".
Here is the StringMatcherTest program. Feel free to copy it, use it, etc.:
public class StringMatcherTest
{
public static void main(String[] args)
{
if (args.length<1)
{
System.err.println("usage: java this 'testRegex'");
System.exit(1);
}
String regexString = args[0];
String spamString = "visit our site\n at http://10.1.1.1/. \nmore
poop here.";
Pattern aPattern =
Pattern.compile(regexString,Pattern.MULTILINE);
Matcher aMatcher = aPattern.matcher(spamString);
if (aMatcher.find())
{
System.out.println("got a match");
}
else
{
System.out.println("no match");
}
}
}
|