|
The first run - reading the contents earlier than needed - an example of how to use avoidance therapyThe next figure shows the Optimizeit output from my first run with my sample dataset. Here I'm reading 1,500 messages, and as you can see from the highlighted line it's taking over 52 seconds to parse these messages.
Optimizeit is telling me that this problem has to do with dealing with the contents of mail messages, including (a) simple contents as well as (b) attachments. Although I could jump right in and try to speed up the process of dealing with attachments, there's something bigger going on here. By running Optimizeit I just realized that a major problem I have is that I'm dealing with contents and attachments way to early in the overall process. The way the application works right now, it loads up everything it needs about an email message right away, including the sender, the subject, the contents, and attachments. In retrospect this is obviously wrong, but during the initial design and development process it didn't seem that important. But now Optimizeit is telling us that the next big problem is very much attachment related, and having looked at the hot spot in the code, I know there's no easy way to fix it. This leads up to something I call ``avoidance therapy''. Frankly, if you can't easily fix a problem, see if you can avoid it. And when it comes to email contents and attachments, I can do exactly that. The main problem is shown in the code below. Here's the way I was originally parsing through an email message:
// next test the ``reject'' conditions\\ incrementMessageSpamRatingBasedOnContents(emailMessage);\\ if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) incrementMessageSpamRatingBasedOnFromField(emailMessage);\\ if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) incrementMessageSpamRatingBasedOnReplyToField(emailMessage);\\ if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) incrementMessageSpamRatingBasedOnSubjectField(emailMessage);\\ if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) deleteMessageWithBadAttachments(emailMessage);\\ As you can see, the first thing I'm doing here is checking the contents, including any attachments, to see if the mail message is indeed a SPAM message. This was based on the theory that the contents field was going to be the easiest way to figure out if a message is spam or not. But now that Optimizeit is telling me that we have a problem here, and I have experience filtering out several thousand messages, I know that I don't have to go down this road first. Therefore, I'm going to hypothesize that if I change the order so that I check the smallest fields first, and the contents and attachments last, my program will run much faster. Given that hypothesis I change the order of the code as follows:
// next test the ``reject'' conditions\\ incrementMessageSpamRatingBasedOnFromField(emailMessage);\\ if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) incrementMessageSpamRatingBasedOnReplyToField(emailMessage);\\ if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) incrementMessageSpamRatingBasedOnSubjectField(emailMessage);\\ if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) deleteMessageWithBadAttachments(emailMessage);\\ if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) incrementMessageSpamRatingBasedOnContents(emailMessage);\\ In theory this sounds a lot better, but does it hold up in practice? When I run Optimizeit again I now get this output:
Next: The second run Up: Optimizing your first project Previous: Finding the performance problems |