devdaily home | java | perl | unix | directory | weblog

up previous next
Next: The second run Up: Optimizing your first project Previous: Finding the performance problems

The first run - reading the contents earlier than needed - an example of how to use avoidance therapy

The next figure shows the Optimizeit output from my first run with my sample dataset. Here I'm reading 1,500 messages, and as you can see from the highlighted line it's taking over 52 seconds to parse these messages.

A great thing that Optimizeit lets you do from this point is drill down into the execution time of your code to see what's really happening at the method level. So, the next thing I do is expand the highlighted line above, and keep expanding it until I get down to a level where I can see what's happening with my code. The next figure shows where I've stopped on this run.

As you can see from this screen, I'm spending a lot of time in the EmailMessage constructor (EmailMessage.EmailMessage()) converting message contents to a String. This is a part of the code I created so I can parse through contents to determine whether something is bad or not.

Optimizeit is telling me that this problem has to do with dealing with the contents of mail messages, including (a) simple contents as well as (b) attachments. Although I could jump right in and try to speed up the process of dealing with attachments, there's something bigger going on here. By running Optimizeit I just realized that a major problem I have is that I'm dealing with contents and attachments way to early in the overall process.

The way the application works right now, it loads up everything it needs about an email message right away, including the sender, the subject, the contents, and attachments. In retrospect this is obviously wrong, but during the initial design and development process it didn't seem that important. But now Optimizeit is telling us that the next big problem is very much attachment related, and having looked at the hot spot in the code, I know there's no easy way to fix it. 

This leads up to something I call ``avoidance therapy''. Frankly, if you can't easily fix a problem, see if you can avoid it. And when it comes to email contents and attachments, I can do exactly that.

The main problem is shown in the code below. Here's the way I was originally parsing through an email message:

 // next test the ``reject'' conditions\\ 
 incrementMessageSpamRatingBasedOnContents(emailMessage);\\ 
 if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) 
   incrementMessageSpamRatingBasedOnFromField(emailMessage);\\ 
 if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) 
   incrementMessageSpamRatingBasedOnReplyToField(emailMessage);\\ 
 if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) )  
   incrementMessageSpamRatingBasedOnSubjectField(emailMessage);\\ 
 if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) 
   deleteMessageWithBadAttachments(emailMessage);\\

As you can see, the first thing I'm doing here is checking the contents, including any attachments, to see if the mail message is indeed a SPAM message. This was based on the theory that the contents field was going to be the easiest way to figure out if a message is spam or not. But now that Optimizeit is telling me that we have a problem here, and I have experience filtering out several thousand messages, I know that I don't have to go down this road first.

Therefore, I'm going to hypothesize that if I change the order so that I check the smallest fields first, and the contents and attachments last, my program will run much faster. Given that hypothesis I change the order of the code as follows:

 // next test the ``reject'' conditions\\ 
 incrementMessageSpamRatingBasedOnFromField(emailMessage);\\ 
 if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) 
   incrementMessageSpamRatingBasedOnReplyToField(emailMessage);\\ 
 if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) 
   incrementMessageSpamRatingBasedOnSubjectField(emailMessage);\\ 
 if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) 
   deleteMessageWithBadAttachments(emailMessage);\\ 
 if (! (emailMessage.getSpamRating() == EmailMessage.MESSAGE\_IS\_GUARANTEED\_SPAM) ) 
   incrementMessageSpamRatingBasedOnContents(emailMessage);\\

In theory this sounds a lot better, but does it hold up in practice? When I run Optimizeit again I now get this output:

Holy cow, that's a big winner there. One quick change and I've reduced the runtime from 52 seconds down to 14.17 seconds. I could probably stop now if I wanted to, but that wouldn't be much fun. Let's press on and see what other problems we can solve.


up previous next
Next: The second run Up: Optimizing your first project Previous: Finding the performance problems