| Developer's Daily | Perl Education |
| front page | java | perl | unix | dev directory | web log |
Introduction
Recently I created a series of Perl programs to help me weed through the output of the Solaris "ps -ef" command. In particular, I was interested in understanding the parent/child relationship between processes running on the system, and understanding what processes were "hogging" the CPU on my system.
In this article we're going to analyze a subroutine from these programs
that processes the output of the ps -ef command into a series
of hashes that can be used in your programs.
Analyzing the output of the ps -ef command
As part of a recent project, I decided to create several Perl programs to help analyze the output of the Solaris ps -ef command. As part of the debugging process of new software I was creating, I was constantly running ps -ef to track the parent/child relationships between processes. I was concerned about processes being spawned and terminated properly, and running ps -ef was a good way to track what was really happening. I created a program named "ancestors.pl" to track the parent/child relationship of processes.
As I proceeded further into my project, I also needed to determine which
processes were consuming the most CPU time on the system. I decided
to again use Perl to run the ps -ef command every 10 seconds and
analyze it's output. The Perl program was designed to report the
most active processes during those 10-second intervals. For instance,
if a particular process consumed six seconds of CPU time over a 10-second
interval, this Perl program (named "CPU hog") reported a CPU usage factor
of 60% during that time period.
Output from ps -ef
Output from the Solaris ps -ef command consists of eight fields of information per record. The first few lines of output from a ps -ef command might look like this on a Solaris system:
For those who are not familiar with Solaris or the ps command,
Table 1 provides a brief description of these output fields.
| Field Name | Field description |
| UID | User ID of the current process (i.e., the user this process belong to) |
| PID | Process ID of the current process |
| PPID | Process ID of the parent process |
| C | Obsolete (processor utilization for scheduling) |
| STIME | Starting time of the process |
| TTY | Controlling terminal for the process |
| TIME | Cumulative execution time for the process |
| CMD | Unix command name |
| Table 1: | This table defines the eight fields of output generated the the Solaris "ps -ef" command. |
The purpose of the GetPsefData subroutine
The purpose of the GetPsefData subroutine is to run the ps -ef command, and store the results of this command into a series of hashes. The primary fields of interest are the process-id (pid), user-id (uid), parent process-id (ppid), time, and Unix command-name (ucmd) fields. The remainder of the ps -ef output fields are ignored for the time being.
Each time a program wants new ps -ef data, it calls this subroutine to run the ps -ef command and put the data into the proper hashes.
The GetPsefData subroutine is shown in Listing 1.
| sub GetPsefData {
open(PSEF_PIPE,"ps -ef|"); $i=0; while (<PSEF_PIPE>) { chomp; @psefField = split(' ', $_, 8); $pid[$i] = $psefField[1]; $uid{$pid[$i]} = $psefField[0]; $ppid{$pid[$i]} = $psefField[2]; ($min,$sec) = split(/:/,$psefField[6]); $time{$pid[$i]} = $min * 60 + $sec; $ucmd{$pid[$i]} = $psefField[7]; $i++; } close(PSEF_PIPE); } |
| Listing 1: | The GetPsefData subroutine opens a pipe to the Unix "ps -ef" command, and processes the output of the command into various hashes. |
Analyzing the GetPsefData subroutine
The GetPsefData subroutine uses the open statement to execute the ps -ef command and create a pipe that we can read from. It attaches this to a file handle named "PSEF_PIPE" that we read from inside the while loop.
The first thing we do inside the while loop is run the chomp
function. The chomp command is used to remove the trailing
newline character from the string, because I don't want it to end up in
a variable during later processing.
Breaking a ps -ef record into fields with the split function
The next line in the while loop creates an array named "@psefField" that contains eight array elements. In this statement, the split function uses the variable $_ as input to generate the eight array elements.
The first parameter in the split function call, ' ' (a blank space), indicates that the "fields" contained in the variable $_ are separated by "whitespace". It also means that whitespace at the beginning of a line should be ignored, which is very important in this case, because each record begins with whitespace. (If you're not familiar with it, the term whitespace is defined as any combination of blanks, tabs, or newline characters.) The last two parameters in the split function identify that we want to split the variable $_ into a maximum of eight fields.
The next statement in the while loop assigns the value of $psefField[1]
to an array variable named $pid[$i]. The PID is an important
field in the output, because it is unique for each record. Each process
running on the system will have a different PID value. We'll use
this to our advantage in this program.
Putting the data into hashes
The next two lines in the code are:
Because uid, ppid, time, and ucmd correspond to the current process-id (pid), it makes life easier to create these as hash variables. This works out great, because later on, when I want to refer to the uid, ppid, time, and ucmd of process id 100, I simply type:
Splitting again
Returning to the code in Listing 1, the next line in the loop uses the split command again. This time, instead of splitting the entire input line, I'm just splitting the contents of the seventh @psefField array element. Because it's a cool language, Perl lets us split this array element into two variables, $min and $sec, with one statement.
Notice that in this step we use the ":" character as the field delimiter. Because the time field looks something like "10:15" (10 minutes, 15 seconds of CPU time consumed), we know that the $min information is to the left of the ":" character, and the $sec information is to the right of the ":" character. In this example the variable $min is assigned the value 10, and the variable $sec is assigned the value 15.
The next line of the program calculates the time consumed by the current process in total seconds, and assigns this value to the hash variable $time{$pid[$i]}. This array lets us track time differences in programs like a "CPU hog" program.
The final line puts the name of the current Unix command in the hash
%ucmd. Notice that as a result of our split command
used earlier, $ucmd{$pid[$i]} may contain one word or several
words with embedded blank spaces. This is important (and desirable), because
the command name is a variable length field, and may contain one or more
words.
Summary of GetPsefData
In summary, the GetPsefData subroutine does several things for us:
Important notes
This subroutine was written to process the output of the "ps -ef"
command on Sun's Solaris operating system. This is one of those cases
where there are differences between Unix operating systems, so the same
function may not work properly on other versions of Unix, such as AIX,
HP-UX, UnixWare, or freeBSD. The fields of output on those systems
may be different than the output fields generated by Solaris, and we haven't
tested those systems. More than likely, small changes will be required
for other Unix systems.
The future
In our next article, we'll examine how this subroutine is used within
real Perl programs to generate useful information from the output of the
ps -ef command. Specifically, we'll show how to track the
ancestry of processes using the ancestors.pl program, which
relies heavily on this subroutine.
Copyright © 1998 DevDaily Interactive, Inc.
All Rights Reserved.