|
To start diagnosing, go to Section 9.3.1.
9.3.1. Is Memory Usage a Problem?
Use top or ps to determine how much memory the application is using. If the application is consuming more memory than it should, go to Section 9.6.6; otherwise, continue to Section 9.3.2.
9.3.2. Is Startup Time a Problem?
If the amount of time that the application takes to start up is a problem, go to Section 9.3.3; otherwise, go to Section 9.3.4.
9.3.3. Is the Loader Introducing a Delay?
To test whether the loader is a problem, set the ld environmental variables described in the previous chapters. If the ld statistics show a significant delay when mapping all the symbols, try to reduce the number and size of libraries that the application is using, or try to prelink the binaries.
If the loader does appears to be the problem, go to Section 9.9. If it does not, continue on to Section 9.3.4.
9.3.4. Is CPU Usage (or Length of Time to Complete) a Problem?
Use top or ps to determine the amount of CPU that the application uses. If the application is a heavy CPU user, or takes a particularly long time to complete, the application has a CPU usage problem.
Quite often, different parts of an application will have different performances. It may be necessary to isolate those parts that have poor performance so that their performance statistics are measured by the performance tools without measuring the statistics of those parts that do not have a negative performance impact. To facilitate this, it may be necessary to change an application's behavior to make it easier to profile. If a particular part of the application is performance-critical, when measuring the performance aspects of the total application, you would either try to measure only the performance statistics when the critical part is executing or make the performance-critical part run for such a long amount of time that the performance statistics from the uninteresting parts of the application are such a small part of the total performance statistics that they are irrelevant. Try to minimize the work that application is doing so that it only executes the performance-critical functions. For example, if we were collecting performance statistics from the entire run of an application, we would not want the startup and exit procedures to be a significant amount of the total time of the application runtime. In this case, it would be useful to start the application, run the time-consuming part many times, and then exit immediately. This allows the profilers (such as oprofile or gprof) to capture more information about slowly running code rather than parts that are executed but unrelated to the problem (such as launching and exiting). An even better solution is to change the application's source, so when the application is launched, the time-consuming portion is run automatically and then the program exits. This would help to minimize the profile data that does not pertain to the particular performance problem.
If the application's CPU usage is a problem, skip to Section 9.5. If it is not a problem, go to Section 9.3.5.
9.3.5. Is the Application's Disk Usage a Problem?
If the application is known to cause an unacceptable amount of disk I/O, go to Section 9.7.3 to determine what files it is accessing. If not, go to Section 9.3.6.
9.3.6. Is the Application's Network Usage a Problem?
If the application is known to cause an unacceptable amount of network I/O, go to Section 9.8.6.
Otherwise, you have encountered an application performance issue that is not covered in this book. Go to Section 9.9.
9.4. Optimizing a System
Sometimes, it is important to approach a misbehaving system and figure out exactly what is slowing everything down.
Because we are investigating a system-wide problem, the cause can be anywhere from user applications to system libraries to the Linux kernel. Fortunately, with Linux, unlike many other operating systems, you can get the source for most if not all applications on the system. If necessary, you can fix the problem and submit the fix to the maintainers of that particular piece. In the worst case, you can run a fixed version locally. This is the power of open-source software.
Figure 9-2 shows a flowchart of how we will diagnose a system-wide performance problem.
Figure 9-2.
Go to Section 9.4.1 to begin the investigation.
9.4.1. Is the System CPU-Bound?
Use top, procinfo, or mpstat and determine where the system is spending its time. If the entire system is spending less than 5 percent of the total time in idle and wait modes, your system is CPU-bound. Proceed to Section 9.4.3. Otherwise, proceed to Section 9.4.2.
9.4.2. Is a Single Processor CPU-Bound?
Although the system as a whole may not be CPU-bound, in a symmetric multiprocessing (SMP) or hyperthreaded system, an individual processor may be CPU-bound.
Use top or mpstat to determine whether an individual CPU has less than 5 percent in idle and wait modes. If it does, one or more CPU is CPU-bound; in this case, go to Section 9.4.4.
Otherwise, nothing is CPU-bound. Go to Section 9.4.7.
9.4.3. Are One or More Processes Using Most of the System CPU?
The next step is to figure out whether any particular application or group of applications is using the CPU. The easiest way to do this is to run top. By default, top sorts the processes that use the CPU in descending order. top reports CPU usage for a process as the sum of the user and system time spent on behalf of that process. For example, if an application spends 20 percent of the CPU in user space code, and 30 percent of the CPU in system code, top will report that the process has consumed 50 percent of the CPU. Sum up the CPU time of all the processes. If that time is significantly less than the system-wide system plus user time, the kernel is doing significant work that is not on the behalf of applications. Go to Section 9.4.5.
Otherwise, go to Section 9.5.1 once for each process to determine where it is spending its time.
9.4.4. Are One or More Processes Using Most of an Individual CPU?
The next step is to figure out whether any particular application or group of applications is using the individual CPUs. The easiest way to do this is to run top. By default, top sorts the processes that use the CPU in descending order. When reporting CPU usage for a process, top shows the total CPU and system time that the application uses. For example, if an application spends 20 percent of the CPU in user space code, and 30 percent of the CPU in system code, top will report that the application has consumed 50 percent of the CPU.
First, run top, and then add the last CPU to the fields that top displays. Turn on Irix mode so that top shows the amount of CPU time used per processor rather than the total system. For each processor that has a high utilization, sum up the CPU time of the application or applications running on it. If the sum of the application time is less than 75 percent of the sum of the kernel plus user time for that CPU, it appears as the kernel is spending a significant amount of time on something other than the applications; in this case, go to Section 9.4.5. Otherwise, the applications are likely to be the cause of the CPU usage; for each application, go to Section 9.5.1.
9.4.5. Is the Kernel Servicing Many Interrupts?
It appears as if the kernel is spending a lot of time doing work not on behalf of an application. One explanation for this is an I/O card that is raising many interrupts, such as a busy network card. Run procinfo or cat /proc/interrupts to determine how many interrupts are being fired, how often they are being fired, and which devices are causing them. This may provide a hint as to what the system is doing. Record this information and proceed to Section 9.4.6.
9.4.6. Where Is Time Spent in the Kernel?
Finally, we will find out exactly what the kernel is doing. Run oprofile on the system and record which kernel functions consume a significant amount of time (greater than 10 percent of the total time). Try reading the kernel source for those functions or searching the Web for references to those functions. It might not be immediately clear what exactly those functions do, but try to figure out what kernel subsystem the functions are in. Just determining which subsystem is being used (such as memory, network, scheduling, or disk) might be enough to determine what is going wrong.
It also might be possible to figure out why these functions are called based on what they are doing. If the functions are device specific, try to figure out why the particular device is being used (especially if it also has a high number of interrupts). E-mail others who may have seen similar problems, and possibly contact kernel developers.
Go to Section 9.9.
9.4.7. Is the Amount of Swap Space Being Used Increasing?
The next step is the check whether the amount of swap space being used is increasing. Many of the system-wide performance tools such as top, vmstat, procinfo, and gnome-system-info provide this information. If the amount of swap is increasing, you need to figure out what part of the system is using more memory. To do this, go to Section 9.6.1.
If the amount of used swap is not increasing, go to Section 9.4.8.
9.4.8. Is the System I/O-Bound?
While running top, check to see whether the system is spending a high percentage of time in the wait state. If this is greater than 50 percent, the system is spending a large amount of time waiting for I/O, and we have to determine what type of I/O this is. Go to Section 9.4.9.
If the system is not spending a large amount of time waiting for I/O, you have reached a problem not covered in this book. Go to Section 9.9.
9.4.9. Is the System Using Disk I/O?
Next, run vmstat (or iostat) and see how many blocks are being written to and from the disk. If a large number of blocks are being written to and read from the disk, this may be a disk bottleneck. Go to Section 9.7.1. Otherwise, continue to Section 9.4.10.
9.4.10. Is the System Using Network I/O?
Next, we see whether the system is using a significant amount of network I/O. It is easiest to run iptraf, ifconfig, or sar and see how much data is being transferred on each network device. If the network traffic is near the capacity of the network device, this may be a network bottleneck. Go to Section 9.8.1. If none of the network devices seem to be passing network traffic, the kernel is waiting on some other I/O device that is not covered in this book. It may be useful to see what functions the kernel is calling and what devices are interrupting the kernel. Go to Section 9.4.5.
9.5. Optimizing Process CPU Usage
When a particular process or application has been determined to be a CPU bottleneck, it is necessary to determine where (and why) it is spending its time.
Figure 9-3 shows the method for investigating a processs CPU usage.
Figure 9-3.
Go to Section 9.5.1 to begin the investigation.
9.5.1. Is the Process Spending Time in User or Kernel Space?
You can use the time command to determine whether an application is spending its time in kernel or user mode. oprofile can also be used to determine where time is spent. By profiling per process, it is possible to see whether a process is spending its time in the kernel or user space.
If the application is spending a significant amount of time in kernel space (greater than 25 percent), go to Section 9.5.2. Otherwise, go to Section 9.5.3.
9.5.2. Which System Calls Is the Process Making, and How Long Do They Take to Complete?
Next, run strace to see which system calls are made and how long they take to complete. You can also run oprofile to see which kernel functions are being called.
It may be possible to increase performance by minimizing the number of system calls made or by changing which systems calls are made on behalf of the program. Some of the system's calls may be unexpected and a result of the application's calls to various libraries. You can run ltrace and strace to help determine why they are being made.
Now that the problem has been identified, it is up to you to fix it. Go to Section 9.9.
9.5.3. In Which Functions Does the Process Spend Time?
Next, run oprofile on the application using the cycle event to determine which functions are using all the CPU cycles (that is, which functions are spending all the application time).
Keep in mind that although oprofile shows you how much time was spent in a process, when profiling at the function level, it is not clear whether a particular function is hot because it is called very often or whether it just takes a long time to complete.
One way to determine which case is true is to acquire a source-level annotation from oprofile and look for instructions/source lines that should have little overhead (such as assignments). The number of samples that they have will approximate the number of times that the function was called relative to other high-cost source lines. Again, this is only approximate because oprofile samples only the CPU, and out-of-order processors can misattribute some cycles.
It is also helpful to get a call graph of the functions to determine how the hot functions are being called. To do this, go to Section 9.5.4.
9.5.4. What Is the Call Tree to the Hot Functions?
Next, you can figure out how and why the time-consuming functions are being called. Running the application with gprof can show the call tree for each function. If the time-consuming functions are in a library, you can use ltrace to see which functions. Finally, you can use newer versions of oprofile that support call-tree tracing. Alternatively, you can run the application in gdb and set a breakpoint at the hot function. You can then run that application, and it will break during every call to the hot function. At this point, you can generate a backtrace and see exactly which functions and source lines made the call.
Knowing which functions call the hot functions may enable you to eliminate or reduce the calls to these functions, and correspondingly speed up the application.
If reducing the calls to the time-consuming functions did not speed up the application, or it is not possible to eliminate these functions, go to Section 9.5.5.
Otherwise, go to Section 9.9.
9.5.5. Do Cache Misses Correspond to the Hot Functions or Source Lines?
Next, run oprofile, cachegrind, and kcache against your application to see whether the time-consuming functions or source lines are those with a high number of cache misses. If they are, try to rearrange or compress your data structures and accesses to make them more cache friendly. If the hot lines do not correspond to high cache misses, try to rearrange your algorithm to reduce the number of times that the particular line or function is executed.
In any event, the tools have told you as much as they can, so go to Section 9.9.
9.6. Optimizing Memory Usage
Often, it is common that a program that uses a large amount of memory can cause other performance problems to occur, such as cache misses, translation lookaside buffer (TLB) misses, and swapping.
Figure 9-4 shows the flowchart of decisions that we will make to figure out how the system memory is being used.
Figure 9-4.
Go to Section 9.6.1 to start the investigation.
9.6.1. Is the Kernel Memory Usage Increasing?
To track down what is using the system's memory, you first have to determine whether the kernel itself is allocating memory. Run slabtop and see whether the total size of the kernel's memory is increasing. If it is increasing, jump to Section 9.6.2.
If the kernel's memory usage is not increasing, it may be a particular process causing the increase. To track down which process is responsible for the increase in memory usage, go to Section 9.6.3.
9.6.2. What Type of Memory Is the Kernel Using?
If the kernel's memory usage is increasing, once again run slabtop to determine what type of memory the kernel is allocating. The name of the slab can give some indication about why that memory is being allocated. You can find more details on each slab name in the kernel source and through Web searches. By just searching the kernel source for the name of that slab and determining which files it is used in, it may become clear why it is allocated. After you determine which subsystem is allocating all that memory, try to tune the amount of maximum memory that the particular subsystem can consume, or reduce the usage of that subsystem.
Go to Section 9.9.
9.6.3. Is a Particular Process's Resident Set Size Increasing?
Next, you can use top or ps to see whether a particular process's resident set size is increasing. It is easiest to add the rss field to the output of top and sort by memory usage. If a particular process is increasingly using more memory, we need to figure out what type of memory it is using. To figure out what type of memory the application is using, go to Section 9.6.6. If no particular process is using more memory, go to Section 9.6.4.
9.6.4. Is Shared Memory Usage Increasing?
Use ipcs to determine whether the amount of shared memory being used is increasing. If it is, go to Section 9.6.5 to determine which processes are using the memory. Otherwise, you have a system memory leak not covered in this book. Go to Section 9.9.
9.6.5. Which Processes Are Using the Shared Memory?
Use ipcs to determine which processes are using and allocating the shared memory. After the processes that use the shared memory have been identified, investigate the individual processes to determine why the memory is being using for each. For example, look in the application's source code for calls to shmget (to allocate shared memory) or shmat (to attach to it). Read the application's documentation and look for options that explain and can reduce the application's use of shared memory.
Try to reduce shared memory usage and go to Section 9.9.
9.6.6. What Type of Memory Is the Process Using?
The easiest way to see what types of memory the process is using is to look at its status in the /proc file system. This file, cat /proc/<pid>/status, gives a breakdown of the processs memory usage.
If the process has a large and increasing VmStk, this means that the processs stack size is increasing. To analyze why, go to Section 9.6.7.
If the process has a large VmExe, that means that the executable size is big. To figure out which functions in the executable contribute to this size, go to Section 9.6.8. If the process has a large VmLib, that means that the process is using either a large number of shared libraries or a few large-sized shared libraries. To figure out which libraries contribute to this size, go to Section 9.6.9. If the process has a large and increasing VmData, this means that the processs data area, or heap, is increasing. To analyze why, go to Section 9.6.10.
9.6.7. What Functions Are Using All of the Stack?
To figure out which functions are allocating large amounts of stack, we have to use gdb and a little bit of trickery. First, attach to the running process using gdb. Then, ask gdb for a backtrace using bt. Next, print out the stack pointer using info registers esp (on i386). This prints out the current value of the stack pointer. Now type up and print out the stack pointer. The difference (in hex) between the previous stack pointer and the current stack pointer is the amount of stack that the previous function is using. Continue this up the backtrace, and you will be able to see which function is using most of the stack.
When you figure out which function is consuming most of the stack, or whether it is a combination of functions, you can modify the application to reduce the size and number of calls to this function (or these functions). Go to Section 9.9.
9.6.8. What Functions Have the Biggest Text Size?
If the executable has a sizable amount of memory being used, it may be useful to determine which functions are taking up the greatest amount of space and prune unnecessary functionality. For an executable or library compiled with symbols, it is possible to ask nm to show the size of all the symbols and sort them with the following command:
nm -S –size-sort
With the knowledge of the size of each function, it may be possible to reduce their size or remove unnecessary code from the application.
Go to Section 9.9.
9.6.9. How Big Are the Libraries That the Process Uses?
The easiest way to see which libraries a process is using and their individual sizes is to look at the processs map in the /proc file system. This file, cat /proc/<pid>/map, will shows each of the libraries and the size of their code and data. When you know which libraries a process is using, it may be possible to eliminate the usage of large libraries or use alternative and smaller libraries. However, you must be careful, because removing large libraries may not reduce overall system memory usage.
If any other applications are using the library, which you can determine by running lsof on the library, the libraries will already be loaded into memory. Any new applications that use it do not require an additional copy of the library to be loaded into memory. Switching your application to use a different library (even if it is smaller) actually increases total memory usage. This new library will not be used by any other processes and will require new memory to be allocated. The best solution may be to shrink the size of the libraries themselves or modify them so that they use less memory to store library-specific data. If this is possible, all applications will benefit.
To find the size of the functions in a particular library, go to Section 9.6.8; otherwise, go to Section 9.9.
9.6.10. What Functions Are Allocating Heap Memory?
If your application is written in C or C++, you can figure out which functions are allocating heap memory by using the memory profiler memprof. memprof can dynamically show how memory usage grows as the application is used.
If your application is written in Java, add the -Xrunhprof command-line parameter to the java command line; it gives details about how the application is allocating memory. If your application is written in C# (Mono), add the -profile command-line parameter to the mono command line, and it gives details about how the application is allocating memory.
After you know which functions allocate the largest amounts of memory, it may be possible to reduce the size of memory that is allocated. Programmers often overallocate memory just to be on the safe side because memory is cheap and out-of-bounds errors are hard to detect. However, if a particular allocation is causing memory problems, careful analysis of the minimum allocation makes it possible to significantly reduce memory usage and still be safe. Go to Section 9.9.
9.7. Optimizing Disk I/O Usage
When you determine that disk I/O is a problem, it can be helpful to determine which application is causing the I/O.
Figure 9-5 shows the steps we take to determine the cause of disk I/O usage.
Figure 9-5.
To begin the investigation, jump to Section 9.7.1
9.7.1. Is the System Stressing a Particular Disk?
Run iostat in the extended statistic mode and look for partitions that have an average wait (await) greater than zero. await is the average number of milliseconds that requests are waiting to be filled. The higher this number, the more the disk is overloaded. You can confirm this overload by looking at the amount of read and write traffic on a disk and determining whether it is close to the maximum amount that the drive can handle.
If many files are accessed on a single drive, it may be possible to increase performance by spreading out these files to multiple disks. However, it is first necessary to determine what files are being accessed.
Proceed to Section 9.7.2.
9.7.2. Which Application Is Accessing the Disk?
As mentioned in the chapter on disk I/O, this is where it can be difficult to determine which process is causing a large amount of I/O, so we must try to work around the lack of tools to do this directly. By running top, you first look for processes that are nonidle. For each of these processes, proceed to Section 9.7.3.
9.7.3. Which Files Are Accessed by the Application?
First, use strace to trace all the system calls that an application is making that have to do with file I/O, using strace -e trace=file. We can then strace using summary information to see how long each call is taking. If certain read and write calls are taking a long time to complete, this process may be the cause of the I/O slowdown. By running strace in normal mode, it is possible to see which file descriptors it is reading and writing from. To map these file descriptors back to files on a file system, we can look in the proc file system. The files in /proc/<pid>/fd/ are symbolic links from the file descriptor number to the actual files. An ls -la of this directory shows which files this process is using. By knowing which files the process is accessing, it might be possible to reduce the amount of I/O the process is doing, spread it more evenly between multiple disks, or even move it to a faster disk.
After you determine which files the process is accessing, go to Section 9.9.
9.8. Optimizing Network I/O Usage
When you know that a network problem is happening, Linux provides a set of tools to determine which applications are involved. However, when you are connected to external machines, the fix to a network problem is not always within your control.
Figure 9-6 shows the steps that we take to investigate a network performance problem.
Figure 9-6.
To start the investigation, continue to Section 9.8.1.
9.8.1. Is Any Network Device Sending/Receiving Near the Theoretical Limit?
The first thing to do is to use ethtool to determine what hardware speed each Ethernet device is set to. If you record this information, you then investigate whether any of the network devices are saturated. Ethernet devices and/or switches can be easily mis-configured, and ethtool shows what speed each device believes that it is operating at. After you determine the theoretical limit of each of the Ethernet devices, use iptraf (of even ifconfig) to determine the amount of traffic that is flowing over each interface. If any of the network devices appear to be saturated, go to Section 9.8.3; otherwise, go to Section 9.8.2.
9.8.2. Is Any Network Device Generating a Large Number of Errors?
Network traffic can also appear to be slow because of a high number of network errors. Use ifconfig to determine whether any of the interfaces are generating a large number of errors. A large number of errors can be the result of a mismatched Ethernet card / Ethernet switch setting. Contact your network administrator, search the Web for people with similar problems, or e-mail questions to one of the Linux networking newsgroups.
Go to Section 9.9.
9.8.3. What Type of Traffic Is Running on That Device?
If a particular device is servicing a large amount of data, use iptraf to track down what types of traffic that device is sending and receiving. When you know the type of traffic that the device is handling, advance to Section 9.8.4.
9.8.4. Is a Particular Process Responsible for That Traffic?
Next, we want to determine whether a particular process is responsible for that traffic. Use netstat with the -p switch to see whether any process is handling the type of traffic that is flowing over the network port.
If an application is responsible, go to Section 9.8.6. If none are responsible, go to Section 9.8.5.
9.8.5. What Remote System Is Sending the Traffic?
If no application is responsible for this traffic, some system on the network may be bombarding your system with unwanted traffic. To determine which system is sending all this traffic, use iptraf or etherape.
If it is possible, contact the owner of this system and try to figure out why this is happening. If the owner is unreachable, it might be possible to set up ipfilters within the Linux kernel to always drop this particular traffic, or to set up a firewall between the remote machine and the local machine to intercept the traffic.
Go to Section 9.9.
9.8.6. Which Application Socket Is Responsible for the Traffic?
Determining which socket is being used is a two-step process. First, we can use strace to trace all the I/O system calls that an application is making by using strace -e trace=file. This shows which file descriptors the process is reading and writing from. Second, we map these file descriptors back to a socket by looking in the proc file system. The files in /proc/<pid>/fd/ are symbolic links from the file descriptor number to the actual files or sockets. An ls -la of this directory shows all the file descriptors of this particular process. Those with socket in the name are network sockets. You can then use this information to determine inside the program which socket is causing all the communication.
Go to Section 9.9. |
|