|
Useful
links:
Freshmeat
IceWalkers
The Linux
Foundation
Linux
Kernel Archives
Linux
Kernel Mailing List
LinuxHQ
The Linux
Documentation Project


|
Sysstat
tutorial
You will
find here a tutorial describing a few use cases for some sysstat
commands. The first section below concerns the sar and sadf commands. The
second one concerns the pidstat
command. Of course, you should really
have a look at the manual pages to know all the features and how these
commands can help you to monitor your system (follow the Documentation link above
for that).
- Section
1: Using sar and sadf
- Section
2: Using pidstat
Section
1: Using sar and sadf
sar
is the system activity reporter. By interpreting the reports that sar
produces,
you can locate system bottlenecks and suggest some possible solutions
to those
annoying performance problems.
The
Linux kernel maintains internal counters that keep track of requests,
completion times, I/O block counts, etc. From this and other
information, sar
calculates rates and ratios that give insight into where the
bottlenecks are.
The
key to understanding sar is that it reports on system activity over a
period of
time. You must take care to collect sar data at an appropriate time
(not at
lunch time or on weekends, for example). Here is one way to invoke sar:
The
-u option specifies our interest in the CPU subsystem. The -o option
will
create an output file that contains binary data. Finally, we will take
3
samples at two-second intervals. Upon completion of the sampling, sar
will
report the results to the screen. This provides us with a snapshot of
current
system activity.
The
above example uses sar in interactive mode. You can also invoke sar
from cron.
In this case, cron would run the /usr/lib/sa/sa1 shell script and
create a
daily log file. The /usr/lib/sa/sa2 shell script is run to format the
log into
human-readable form. These scripts may be invoked by a crontab run by
root
(although
I prefer to use adm). Here is the crontab, located in /etc/cron.d
directory and using Vixie cron syntax, that makes this
happen:
|
#
Run system activity accounting tool every 10 minutes
*/10
* * * * root
/usr/lib/sa/sa1 -d 1 1
# 0 *
* * * root
/usr/lib/sa/sa1 -d 600 6 &
#
Generate a daily summary of process accounting at 23:53
53 23
* * * root
/usr/lib/sa/sa2 -A
|
In
reality, the sa1 script initiates a related utility called sadc. sa1
gives sadc
several arguments to specify the amount of time to wait between
samples, the
number of samples, and the name of a file into which the binary results
should
be written.
A
new file is created each day so that we can easily interpret daily
results. The
sa2 script calls sar, which formats the binary data into human-readable
form.
Let's
think of our system as being composed of three interdependant
subsystems: CPU,
disk and memory. Our goal is to find out which subsystem is responsible
for any
performance bottleneck. By analyzing sar's output, we can achieve that
goal.
Listing
below represents the report produced by initiating the sar -u command.
Initiating sar in this manner produces a report from the daily log file
produced by sadc.
|
Linux
2.6.8.1-27mdkcustom (localhost) 03/29/2006
09:00:00 PM
CPU
%user
%nice
%system
%iowait
%steal
%idle
09:10:00 PM
all
96.18
0.00
0.42
0.00
0.00
3.40
09:20:00 PM
all
97.99
0.00
0.36
0.00
0.00
1.65
09:30:00 PM
all
97.59
0.00
0.38
0.00
0.00
2.03
...
|
The
%user and %system columns simply specify the amount of time the CPU
spends in
user and system mode. The %iowait and %idle columns are of interest to
us when
doing performance analysis. The %iowait column specifies the amount of
time the
CPU spends waiting for I/O requests to complete. The %idle column tells
us how
much useful work the CPU is doing. A %idle time near zero indicates a
CPU
bottleneck, while a high %iowait value indicates unsatisfactory disk
performance.
Additional
information can be obtained by the sar -q command, which displays the
run queue
length, total number of processes,
and
the load averages for the past one, five and fifteen minutes:
|
Linux
2.6.8.1-27mdkcustom (localhost) 03/29/2006
09:00:00 PM
runq-sz plist-sz
ldavg-1 ldavg-5 ldavg-15
09:10:00
PM
2
121
2.22
2.17
1.45
09:20:00
PM
6
137
2.79
2.48
1.73
09:30:00
PM
5
129
3.31
2.83 1.95
...
|
This
example shows that the system is busy (since more than one process is
runnable
at any given time) and rather overloaded.
sar
also lets you monitor memory utilization. Have a look at the following
example
produced by sar -r:
|
Linux
2.6.8.1-27mdkcustom
(localhost) 03/29/2006
09:00:00 PM kbmemfree
kbmemused %memused
kbbuffers kbcached
kbswpfree kbswpused %swpused kbswpcad
09:10:00 PM
591468
444388
42.90
19292 227412
1632920
0
0.00
0
09:20:00 PM
546860
488996
47.21
21844 243900
1632920
0
0.00
0
09:30:00 PM
538268
497588
48.04
25308 267228
1632920
0
0.00
0
...
|
This
listing shows that the system has plenty of free memory. Swap space is
not
used. So memory is not a problem here. You can double-check this by
using sar
-W to get swapping statistics:
|
Linux
2.6.8.1-27mdkcustom (localhost) 03/29/2006
09:00:00 PM
pswpin/s pswpout/s
09:10:00 PM
0.00
0.00
09:20:00 PM
0.00
0.00
09:30:00 PM
0.00
0.00
...
|
sar
can also help you to monitor disk activity. sar -b displays I/O and
transfer
rate statistics grouped for all block devices:
|
Linux
2.6.8.1-27mdkcustom (localhost) 03/29/2006
09:00:00 PM
tps
rtps
wtps
bread/s
bwrtn/s
09:10:00 PM
6.37
2.32
4.05
126.84
61.41
09:20:00 PM
4.03
0.74
3.29
54.49
46.04
09:30:00 PM
6.71
3.11
3.59
80.13
49.18
...
|
sar
-d enables you to get more detailed information on a per device basis.
It
displays statistics data similar to those displayed by iostat:
|
Linux
2.6.8.1-27mdkcustom (localhost)
03/29/2006
09:00:00
AM
DEV
tps rd_sec/s wr_sec/s
avgrq-sz
avgqu-sz
await
svctm %util
09:10:00 AM
sda
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
09:10:00 AM
sdb
18.09
0.00
160.80
8.89
0.01
0.67
0.19
0.35
09:20:00 AM
sda
2.51
0.00
52.26
20.80
0.00
0.60
0.40
0.10
09:20:00 AM
sdb
18.91
0.00
141.29
7.47
0.02
0.92
0.21
0.40
09:30:00 AM
sda
26.87
11.94
291.54
11.30
0.12
4.33
1.07 2.89
09:30:00 AM
sdb
7.00
0.00
54.00
7.71
0.00
0.50
0.14
0.10
...
|
sar
has numerous other options that enable you to gather statistics for
every part
of your system. You will find useful information about them in the
manual page.
OK.
As a last example, let's show how the sadf command can help us to
produce some
graphs.
We
use the command sar -B to display paging
statistics from daily data file sa29 (see example below).
# sar
-B -f /var/log/sa/sa29
Linux 2.6.8.1-27mdkcustom (localhost)
03/29/2006
09:00:00 PM
pgpgin/s
pgpgout/s fault/s majflt/s
09:10:00 PM
63.42
30.71
267.35
0.45
09:20:00 PM
27.25
23.02
281.88
0.26
09:30:00
PM
40.06
24.59
246.51
0.32
09:40:00
PM
43.58
26.11
265.25
0.34
09:50:00 PM
34.12
28.38
271.54
0.37
Average:
41.69
26.56
266.51
0.35
|
sadf
-d extracts data in
a format that can be easily ingested by a relational database:
|
# sadf
-d /var/log/sa/sa29 -- -B
localhost;601;2006-03-29 19:10:00 UTC;63.42;30.71;267.35;0.45
localhost;600;2006-03-29 19:20:00 UTC;27.25;23.02;281.88;0.26
localhost;600;2006-03-29 19:30:00 UTC;40.06;24.59;246.51;0.32
localhost;600;2006-03-29 19:40:00 UTC;43.58;26.11;265.25;0.34
localhost;600;2006-03-29 19:50:00 UTC;34.12;28.38;271.54;0.37
|
If
we saw this as a text
file, both Excel and Open Office will allow us to specify a semicolon
as a
field delimiter. Then
we can generate our
performance report and graph.

Section
2: Using pidstat
The pidstat command is
used to monitor processes and threads currently being managed by the
Linux kernel. It can also monitor the children of those processes and
threads.
With
its -d option, pidstat can report I/O statistics, providing that you
have a recent Linux kernel (2.6.20+) with the option
CONFIG_TASK_IO_ACCOUNTING compiled in. So imagine that your system is
undergoing heavy I/O and you want to know which tasks are generating
them. You could then enter the following command:
|
$ pidstat
-d 2
Linux 2.6.20 (localhost) 09/26/2007
10:13:31 AM
PID kB_rd/s kB_wr/s
kB_ccwr/s Command
10:13:31 AM
15625 1.98
16164.36
0.00 dd
10:13:33 AM
PID kB_rd/s kB_wr/s
kB_ccwr/s Command
10:13:33 AM
15625 4.00
20556.00
0.00 dd
10:13:35 AM
PID kB_rd/s kB_wr/s
kB_ccwr/s Command
10:13:35 AM
15625 0.00
10642.00
0.00 dd
...
|
This
report tells us that there is only one task (a "dd" command with PID
15625) which is responsible for these I/O.
When no PID's are explicitly selected on the command line (as in the
case above), the pidstat command examines all the tasks managed by the
system but displays only those whose statistics are varying during the
interval of time. But you can also indicate which tasks you want to
monitor. The following example reports CPU statistics for PID 8197 and
all its threads:
|
$ pidstat
-t -p 8197 1 3
Linux 2.6.8.1-27mdkcustom (localhost)
09/26/2007
10:40:05 AM
PID
TID %user %system
%CPU CPU Command
10:40:06 AM
8197
- 71.29
1.98 73.27
0 procthread
10:40:06
AM
-
8197 71.29
1.98 73.27
0 |__procthread
10:40:06
AM
-
8198 0.00
0.99
0.99 0 |__procthread
10:40:06 AM
PID
TID %user %system
%CPU CPU Command
10:40:07 AM
8197
- 67.00
2.00 69.00
0 procthread
10:40:07
AM
-
8197 67.00
2.00 69.00
0 |__procthread
10:40:07
AM
-
8198 1.00
1.00
2.00 0 |__procthread
10:40:07 AM
PID
TID %user %system
%CPU CPU Command
10:40:08 AM
8197
- 56.00
6.00 62.00
0 procthread
10:40:08
AM
-
8197 56.00
6.00 62.00
0 |__procthread
10:40:08
AM
-
8198 2.00
1.00
3.00 0 |__procthread
Average:
PID
TID %user %system
%CPU CPU Command
Average:
8197
- 64.78
3.32 68.11
- procthread
Average:
-
8197 64.78
3.32 68.11
- |__procthread
Average:
-
8198 1.00
1.00
1.99 - |__procthread
|
As a last example, let me show you how pidstat helped
me to detect a memory leak in the pidstat command itself. At that time
I was testing the very first version of pidstat I wrote for sysstat
7.1.4 and fixing the last remaining bugs. Here is the command I entered
on the command line and the output I got:
|
$ pidstat -r 2
Linux 2.6.8.1-27mdkcustom (localhost) 09/26/2007
10:59:03 AM PID
minflt/s majflt/s VSZ
RSS %MEM Command
10:59:05 AM 14364
113.66 0.00
2480 1540 0.15 pidstat
10:59:05 AM PID
minflt/s majflt/s VSZ
RSS %MEM Command
10:59:07 AM 7954
150.00 0.00 27416
19448 1.88 net_applet
10:59:07 AM 14364
120.00 0.00
3048 2052 0.20 pidstat
10:59:07 AM PID
minflt/s majflt/s VSZ
RSS %MEM Command
10:59:09 AM 14364
116.00 0.00
3488 2532 0.24 pidstat
10:59:09 AM PID
minflt/s majflt/s VSZ
RSS %MEM Command
10:59:11 AM
7947 0.50
0.00 27044 18356 1.77 mdkapplet
10:59:11 AM 14364
116.00 0.00
3928 3012 0.29 pidstat
10:59:11 AM PID
minflt/s majflt/s VSZ
RSS %MEM Command
10:59:13 AM 7954
155.50 0.00 27416
19448 1.88 net_applet
10:59:13 AM 14364
115.50 0.00
4496 3488 0.34 pidstat
...
|
I
noticed that pidstat had a memory footprint (VSZ and RSS fields) that
was constantly increasing as the time went by. I quickly found that I
had forgotten to close a file descriptor in a function of my code and
that was responsible for the memory leak...!
|