| Developer's Daily | Java Education |
| front page | java | perl | unix | DevDirectory |
Introduction
When working with any general-purpose programming language, it's often necessary to break a large string into smaller components. Whether you're working with Unix system files, older Windows' ".ini" files, or maybe flat files in a text database, you'll often read in a record of information, and then break that record up into smaller chunks. More recently, I've used this method to programmatically interpret the contents of HTML pages for web "robots".
In this article we'll demonstrate how to break Java String's
into smaller components, called tokens. We'll begin by breaking
a simple well-known sentence into words, and then we'll demonstrate how
to use the same technique to work with a flat-file database.
Breaking a sentence into words
One of the most famous sentences in American history begins with the words "Four score and seven years ago". For our purposes, suppose this sentence is stored in a variable named speech, like this:
One quick note before continuing - If you try this example, you'll need to import the java.util.* package to run this example, like this:
An example using a text database file
In the example just shown, a text string was broken down into separate tokens. Of course in the English language we call these tokens "words", and these words are usually separated by whitespace. In our next example, we'll show how to break a database record (separated by colon characters) into tokens typically called "fields".
The following two records are from a hypothetical customer database file named customer.db. Each record contains information about a customer, including their first name, last name, and the city and state of their address. Within a record, each field is separated by a colon character.
StringTokenizer st = new StringTokenizer(dbRecord, ":");
String fname = st.nextToken();
String lname = st.nextToken();
String city = st.nextToken();
String state = st.nextToken();
println("First Name: " + fname);
println("Last Name: " + lname);
println("City: " + city);
println("State: " + state);
This technique is demonstrated completely in Listing 1, where
the entire customer.db text database is read (record-by-record),
and printed.
|
// source code developed by DevDaily Interactive, Inc. import java.io.*;
class TokenTest { public static void main (String[] args)
{
void dbTest() { DataInputStream dis =
null;
try { File
f = new File("customer.db");
//
read the first record of the database
StringTokenizer st = new StringTokenizer(dbRecord, ":");
System.out.println("First Name: " + fname);
} catch (IOException
e) {
} finally {
} // end if } // end finally } // end dbTest } // end class
|
| Listing 1: | The file TokenTest.java program demonstrates a method to read every record in a text database file (customer.db), and break the data records into tokens (i.e., fields) using the StringTokenizer class. |
Final thoughts and source code
The simple method we've shown here is a powerful way of breaking a String into tokens. If you need a more powerful tokenizer, you might look at the StreamTokenizer class instead. The StreamTokenizer class can recognize various comment styles of programming languages, and offers a number of control flags that can be set to various states.
If you'd like to download the source code shown in Listing 1, just click here, and the Java code will be displayed in your browser. Then just use the "File | Save As" option of your browser to save the source code to your system.
To download the customer.db database file, just click here, and follow the same procedure to save the file to your local filesystem.
Copyright 1998-2009 Alvin Alexander, devdaily.com
All Rights Reserved.