HOME  |    TRAINING  |   FREE TUTORIALS   |   JOBS
Find out more about our new RSS feed.
FREE Tutorial
JAVA: PERFORMANCE TUNING AND MEMORY MANAGEMENT PART 1 - INTRODUCTION (Page 3)

CATEGORY
SEARCH OUR OTHER TUTORIALS

DESCRIPTION

Since it came into existence, Java code has had a reputation for executing slowly, and that reputation was partially deserved. This is particularly true of the early Java Virtual Machine implementations, which offered purely interpreted execution. However, while Java is still not (and may never be) the ideal language for performing extremely CPU-intensive operations, it is entirely suitable for most software applications.
Click here to be kept informed of our new Tutorials.


This free tutorial is a sample from the book Professional Java Programming.


The report created by HPROF allows you to determine what has been allocated, and identifies the code that caused the memory to be allocated. For example, we can use these entries to determine which section of code caused this Vector to be created and all the appropriate class definitions loaded. To do so, let's review the entry that identifies the creation of the Vector object:

OBJ 7ba560 (sz=20, trace=11, class=java/util/Vector@7b6ac0)
elementData 7ba6a0

The trace portion of this entry identifies the trace entry that is associated with the creation of this object, and by searching the file created by HPROF for the trace entry with an identifier of 11, we find the following:

TRACE 11:
java/lang/ClassLoader.<clinit>(ClassLoader.java:1156)

This indicates that line 1156 in ClassLoader caused a Vector to be created and its associated class definitions loaded, and the <clinit> "method" indicates that the Vector was created during the initialization of the static fields in ClassLoader. By examining line 1156 of ClassLoader, we find the following code:

private static Vector loadedLibraryNames = new Vector();

As you can see from this example, the heap dump allows you to easily associate each memory allocation with a specific location within the Java code. However, as we'll see later, there's an easier way to identify which portions of your code allocate large amounts of memory.

The heap Option

You can use this option to control the type of heap information provided by HPROF, and valid values are dump, sites, and all. The dump value causes HPROF to generate only the heap dump information in its output, while sites causes it to generate only the allocation site listing, and all will result in both sections being created. The default value is all.

Allocation Sites Section

As mentioned earlier, the heap dump lists each area of storage that's allocated by your code, but it does not provide an easy way to identify which areas of your code are allocating most of the memory that's used. Fortunately, the HPROF output includes a list of allocation sites that appear in descending order based on the amount of storage each site is allocated.

To provide an example of this, we'll use the following MemoryTest application that allocates two byte arrays: one very large, and one relatively small. In this simple example, it's obvious which part of the code is responsible for allocating most of the memory. However, we'll see how HPROF's allocation site information can be used to identify the relevant section of code, and the approach illustrated here can just as easily be used with larger, more complex applications:

public class MemoryTest {

private static byte[] buffer1;
private static byte[] buffer2;

public static void main(String[] args) {
 allocateALittle();
 allocateALot();
} 

private static void allocateALittle() {
 buffer1 = allocateBuffer(10000);


} 

private static void allocateALot() {
 buffer2 = allocateBuffer(1000000);
} 

private static byte[] allocateBuffer(int length) {
 byte[] buffer = new byte[length];
 return buffer;
} 

}

When this application is executed, output similar to the following will appear in the allocation site section of the report generated by HPROF:

SITES BEGIN (ordered by live bytes) Sat Apr 15 09:43:35 2000
    percent     live    alloc'ed stack class
rank  self accum  bytes objs  bytes objs trace name
 1 84.84% 84.84% 1000004  1 1000004  1 1882 [B
 2 1.39% 86.24%  16388  1  16388  1 1228 [C
 3 1.39% 87.63%  16388  1  16388  1 1235 [C
 4 0.99% 88.61%  11628  3  11628  3  984 [C
 5 0.85% 89.46%  10004  1  10004  1 1881 [B
 6 0.70% 90.16%   8196  1  8196  1 1226 [B
 7 0.70% 90.85%   8196  1  8196  1 1233 [B
 8 0.68% 91.54%   8068  1  8068  1  987 [S
 9 0.62% 92.15%   7252  3  7252  3  981 [C
10 0.31% 92.46%   3612  1  3612  1  763 [L<Unknown>;
11 0.22% 92.68%   2600  70  3188  85  57 [C
12 0.21% 92.89%   2532  1  2532  1  988 [I
13 0.20% 93.10%   2380  17  2380  17   1 java/lang/Class
14 0.20% 93.30%   2356  1  2356  1 1220 [C
15 0.17% 93.47%   2052  1  2052  1  71 [B
16 0.10% 93.57%   1236  3  1236  3  56 [L<Unknown>;
17 0.09% 93.66%   1028  1  1028  1  986 [B
18 0.08% 93.75%   996  83  1020  85  57 java/lang/String
19 0.08% 93.83%   980  7   980  7 1407 java/lang/Class
20 0.07% 93.90%   840  42   840  42  58 java/util/
                         Hashtable$Entry

Each entry in this list represents storage allocations associated with a stack trace, and the entries are ordered based on most to least memory allocated. In this case, trace 1882 allocated roughly one megabyte of storage, which accounts for approximately 85% of the total storage in use. Note also that the fifth entry in the list indicates that trace 1881 allocated a much smaller 10K, which accounts for less than one percent of the storage in use.

The information provided in this report is largely self-explanatory, with the first column providing a ranking of each trace based on how much memory it allocated. The second and third columns identify the amount of storage (as a percentage of the total) allocated by that trace, and a cumulative value for the entries so far listed, respectively. The two pairs of columns containing bytes and objs represent the number of bytes and objects currently in use ("live") and the numbers originally allocated ("alloc'ed").

Finally, the last two columns identify the stack trace associated with the allocation(s) and the type of object created. In this case, most of the entries in the last column start with an open bracket, which indicates that an array of primitives was allocated, and in the case of MemoryTest, our allocation of an array of byte values is denoted [B. In other cases, however, the name of a class appears, which indicates that the allocation was due to object instantiation instead of the creation of an array. The following table identifies the "class name" that appears for each type that can be allocated:

Now that we understand the information in the allocation site report, it can be used to identify the code that's responsible for allocating most of the storage in use by the application. The site report indicates that the code path associated with trace entry 1882 allocated approximately 85% of the storage in use, and by simply searching the output created by HPROF to locate trace 1882, we find the following listing:

TRACE 1882: (thread=1)
MemoryTest.allocateBuffer(MemoryTest.java:20)
MemoryTest.allocateALot(MemoryTest.java:16)
MemoryTest.main(MemoryTest.java:8)

This indicates that most of the storage used by this application was allocated in the allocateBuffer() method when it was called from allocateALot(). As expected, the HPROF output identified the path that creates the large array of byte values. Similarly, trace entry 1881 identifies the execution path through the allocateALittle() method that was responsible for allocating a much smaller portion of storage:

TRACE 1881: (thread=1)
MemoryTest.allocateBuffer(MemoryTest.java:20)
MemoryTest.allocateALittle(MemoryTest.java:12)
MemoryTest.main(MemoryTest.java:7)

CPU Samples and Times Section

In addition to reporting on memory allocation, HPROF is able to provide information that identifies which portions of your code the processor spends the most time executing, and it offers two different ways of doing so. You need to specify that HPROF generate one of these sections using the cpu option.

The first technique involves taking periodic samples during execution of the code and counting the number of times that the processor was found to be executing a certain code path. With a reasonably large number of samples, that approach will provide an accurate representation of time spent executing a given section of code.

The second approach that HPROF is able to use is to simply record each method call and the time it takes to complete. The times are then added together for each execution path and reported as a percentage of the total execution time of the code.

In this example, we'll use the first (sampling) approach with another simple application. As with the MemoryTest application, it's easy to identify the portion of the code that consumes the most resources, but again, the technique used here is applicable to larger and more complex applications as well:

public class CPUTest {

public static void main(String[] args) {
 executeQuickly();
 executeSlowly();
} 

private static void executeQuickly() {
 loopAWhile(100000);
} 

private static void executeSlowly() {
 loopAWhile(1000000);
} 

private static void loopAWhile(int count) {
 for (int i = 0; i < count; i++) {

  // Do something to slow down this loop
  Object o = new Object();
 } 
} 

}

If this code is executed with HPROF enabled and cpu=sample specified, it will generate a section with CPU sample information like the one shown below (though the data will vary between runs):

CPU SAMPLES BEGIN (total = 655) Sat Apr 15 10:57:51 2000
rank  self accum  count trace method
1 78.32% 78.32%   513  77 CPUTest.loopAWhile
2 7.94% 86.26%   52  75 CPUTest.loopAWhile
3 0.76% 87.02%    5  64 sun/net/www/protocol/file/
               Handler.openConnection
4 0.46% 87.48%    3   9 sun/misc/
               URLClassPath$JarLoader.getJarFile
5 0.46% 87.94%    3  50 java/io/BufferedReader.readLine
6 0.46% 88.40%    3  62 java/lang/Class.forName0
7 0.31% 88.70%    2  25 java/util/jar/JarFile.getManifest
8 0.31% 89.01%    2   7 sun/misc/URLClassPath$2.run
9 0.31% 89.31%    2  15 java/util/zip/ZipFile.getEntry
10 0.31% 89.62%    2  48 sun/io/ByteToCharISO8859_1.convert
11 0.31% 89.92%    2  41 java/security/Policy$1.run
12 0.31% 90.23%    2  74 CPUTest.executeQuickly
13 0.31% 90.53%    2  11 java/io/Win32FileSystem.canonicalize
14 0.31% 90.84%    2  44 java/util/Properties.load
15 0.15% 90.99%    1  28 java/util/jar/Attributes.read
16 0.15% 91.15%    1  38 java/io/FileInputStream.open

Upon viewing this information, it's obvious that there is a hot spot in this application that's responsible for approximately 80% of its execution time, specifically the loopAWhile() method in CPUTest. Note, however, that there are actually two entries associated with loopAWhile(), one of which represents an execution path that was active when 513 of the 655 samples were taken, while the other was active only 52 times. By locating the trace information associated with these, we find that the first entry corresponds to loopAWhile() being called from executeSlowly(), while the second corresponds to a call from executeQuickly():

TRACE 77:
CPUTest.loopAWhile(CPUTest.java:Compiled method)
CPUTest.executeSlowly(CPUTest.java:13)
CPUTest.main(CPUTest.java:5)

TRACE 75:
CPUTest.loopAWhile(CPUTest.java:Compiled method)
CPUTest.executeQuickly(CPUTest.java:9)
CPUTest.main(CPUTest.java:4)

The difference in these two execution paths illustrates an important point related to identifying hot spots in your code. It's sometimes not sufficient to simply identify the methods that the processor spends the most time executing, but also where those methods were called from (the execution path) and/or the parameter values passed.

The cpu Option

This option allows you to control how HPROF gathers information about where the processor spends it time in your code, and valid values are times, samples, and old:

  • times
    HPROF will record each method call and the amount of time spent executing the method each time it is called, and a "CPU Time" section will appear in the generated report.
  • samples
    HPROF periodically determines what code is being executed and the execution path that resulted in the execution of that code, and counts the number of times it finds the processor in that execution path. Once the report is generated, it will contain a "CPU Samples" section that lists execution paths in descending order based on how frequently they were found to be executing. By using this option, you can easily identify hot spots within your code as was done previously with the CPUTest application.
  • old
    HPROF generates profile information that's similar to that supported in earlier versions of Java. However, that information is not as detailed or useful as the new format, so you should not normally specify cpu=old.

The cutoff Option

As illustrated in this listing below, many of the samples represent an extremely small percentage of the code's total execution time:

CPU SAMPLES BEGIN (total = 655) Sat Apr 15 10:57:51 2000
rank  self accum  count trace method
1 78.32% 78.32%   513  77 CPUTest.loopAWhile
2 7.94% 86.26%   52  75 CPUTest.loopAWhile
3 0.76% 87.02%    5  64 sun/net/www/protocol/file/
               Handler.openConnection
4 0.46% 87.48%    3   9 sun/misc/
               URLClassPath$JarLoader.getJarFile
5 0.46% 87.94%    3  50 java/io/BufferedReader.readLine
6 0.46% 88.40%    3  62 java/lang/Class.forName0
7 0.31% 88.70%    2  25 java/util/jar/JarFile.getManifest
8 0.31% 89.01%    2   7 sun/misc/URLClassPath$2.run
9 0.31% 89.31%    2  15 java/util/zip/ZipFile.getEntry
10 0.31% 89.62%    2  48 sun/io/ByteToCharISO8859_1.convert
11 0.31% 89.92%    2  41 java/security/Policy$1.run
12 0.31% 90.23%    2  74 CPUTest.executeQuickly
13 0.31% 90.53%    2  11 java/io/Win32FileSystem.canonicalize
14 0.31% 90.84%    2  44 java/util/Properties.load
15 0.15% 90.99%    1  28 java/util/jar/Attributes.read
16 0.15% 91.15%    1  38 java/io/FileInputStream.open

By default, HPROF will include all execution paths that represent at least 0.01 percent of the processor's time, but you can increase that cutoff point using this option. For example, to eliminate all entries that represent less than one-fourth of one percent of the total execution time, you could specify cutoff=0.0025, which would have resulted in only the first 14 of the above entries being created.



PREVIOUS PAGE



7 RELATED COURSES AVAILABLE
INTRODUCTION TO JAVA PROGRAMMING
The aims of this Java training courses is to understand the role that Java plays on the Internet; describe the be....
JAVA (V1.2): ADVANCED PROGRAMMING
This course teaches the reader to learn, understand and become familiar with the advanced features of Java progra....
INTRODUCTION TO JAVABEANS
To go from the fundamentals of JavaBeans programming to the threshold of Advanced level. Gaining in depth progr....
C++ PROGRAMMING
Object oriented programming is fast becoming the leading software design methodology, with C++ becoming ever more....
C PROGRAMMING
This course is design to provide non-C programmers with the essential skills and knowledge necessary to allow the....
 
1 RELATED JOBS AVAILABLE
JAVA DEVELOPER MANCHESTER
Computer Futures Solutions are seeking a Senior Java Developer for their Manchester based client. You will be joi....
CONTACT US
Tuesday 21st May 2013  © COPYRIGHT 2013 - website design by Website Design by Visualsoft