Despite the pandemic, you are enjoying working from home—typically pairing a nice shirt with old sweatpants. The kids are in their virtual class and the dog refrained from barking during the last Zoom meeting. You just had lunch. You’ve got this. 

And then, of course, you suddenly don’t. A flurry of emails and messages fall on you, complaining that the system has slowed to a crawl. You feel queasy…you don’t know if it’s because you ate that questionably old BBQ chicken from the fridge, or the prospect of facing a long Zoom meeting to deal with the users’ revolt.

Time to be a hero. Again. Or not.

You start asking questions: Is the system stuck? Or just too slow? The CPU doesn’t even seem to be that busy. Could it be that the changes that we deployed yesterday killed the server?

Producing and analyzing Java thread dumps remains one of the best ways to find performance bottlenecks in a Java application. In a previous article, I discussed the tool jstack to produce thread dumps. However, jstack is part of the Java Developer Kit, which administrators prefer not to install in production servers. In this article, I will discuss a handy alternative for Linux systems: using the command kill to send a signal to the Java process.

Quick recap

Performance problems are usually caused by one or more resources that are tying down the server. Those bottlenecks come in two flavors: related to the CPU, or not. When the problem isn’t caused by CPUs running close to their capacity, the challenge is to find what resource is the bottleneck.

Using jstack or kill to get thread dumps is a useful in any of the following situations:

  • The system seems to be slow, but the CPU doesn’t seem to be very busy. 
  • The system seems to get stuck sometimes, for example when you restart the server, and it seems to freeze for no reason.
  • We want to know what’s happening (or not) in the application.
  • We need to monitor the application for long time periods.

If the CPU is pegged up above 80% or even 100%, thread dumps may help. However, Hot Threads or Java profilers are vastly more useful in those cases.

Although this is a relatively advanced technique, I will try to make this article beginner-friendly.

You need thread dumps!

A thread dump is a list, or “dump,” of all the threads running in a Java application at a point-in-time.

The application may be Alfresco, Ephesoft, Solr, or any other running within a Java Virtual Machine.

Why are multiple thread dumps useful?

It’s the difference between looking at a movie and a single picture. Looking at just one snapshot will not tell you what is moving and what is stuck. For example, let’s consider the following analogy: the picture below shows four snapshots of three threads. Each row is a thread, and each column is a snapshot (or thread dump):

Did you notice the three first snapshots of Thread 2 look the same? If each snapshot is 0.1 seconds apart, that means that Thread 2 was stuck for 0.3 seconds. It looks like we found a bottleneck! 

How many? How often?

The number of thread dumps and their frequency will vary depending on the situation. For example, if you suspect that there is a long query that is taking several seconds to complete, it would make sense to get a snapshot every second.

If the snapshots are too far apart, we will not be able to catch the stuck threads.

How to get thread dumps?

There are many ways to get thread dumps:

  • Using the command ‘kill -3’ to send a SIGQUIT signal to the process (Linux only)
  • Using the jstack utility (requires a JDK, may not work in Microsoft Windows)
  • Monitoring tools, such as jconsole, jvisualvm (requires installation of JDK or some other software)
  • Alfresco’s administrative console (Enterprise version only)
  • The Order of the Bee Support Tools (requires installation of AMP)

Each method has its pros and cons. In this article we will focus on the command ‘kill -3’, since it doesn’t require additional software, can be run from a script, and can be used even when Alfresco is almost dead and not responding to your browser.

Pros and Cons of SIGQUIT

When a Java program receives the SIGQUIT signal, it will produce a thread dump. The command ‘kill -3’  is a convenient method that has two advantages:

  • It’s part of the operating system, so it doesn’t require installation. You don’t have to set up a special agent or add any new parameters to your Java command line.
  • It can be used within a script.

However, the kill command has some issues:

  • It requires access to the command line of the server.
  • The dumps are sent to the standard output of the process. In the case of web applications, the output will go to one of the logs. Although it is nice to have the thread dumps along with the log, you will have to figure out how to extract the dumps from the log.

 

Be extra cautious when using the command kill! You could bring down the application if you miss the ‘-3’ parameter.

How do we generate multiple thread dumps?

Step 1: Find the process ID

You may use the command ps to list the Java processes:

ps aux | grep java

Your results may look like the following:

luis       470 42.7  0.3 4078464 112208 tty1   Sl   13:03   0:02 /usr/bin/java -Djava.util.logging.config.file=/home/luis/alf5.2.7/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms512M -Xmx16325M -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Djava.awt.headless=true -Dalfresco.home=/home/luis/alf5.2.7 -XX:ReservedCodeCacheSize=128m -Xms128m -Xmx1024m -XX:+DisableExplicitGC -Djava.awt.headless=true -Dalfresco.home=/home/luis/alf5.2.7 -Dcom.sun.management.jmxremote -Dsun.security.ssl.allowUnsafeRenegotiation=true -XX:ReservedCodeCacheSize=128m -Djdk.tls.ephemeralDHKeySize=2048 -Djava.endorsed.dirs=/home/luis/alf5.2.7/tomcat/endorsed -classpath /home/luis/alf5.2.7/tomcat/bin/bootstrap.jar:/home/luis/alf5.2.7/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/home/luis/alf5.2.7/tomcat -Dcatalina.home=/home/luis/alf5.2.7/tomcat -Djava.io.tmpdir=/home/luis/alf5.2.7/tomcat/temp org.apache.catalina.startup.Bootstrap start luis       499  0.0  0.0  12892  1116 tty1     S    13:03   0:00 grep –color=auto java

In this example, the Alfresco process ID is 470.

Potential Yikes! Make sure that you have enough authority to see the process. If you don’t see it listed, try using ‘sudo‘.

Step 2a: Generate the first thread dump

Armed with the process ID, let’s get the thread dump using the following command (remember to replace the PROCESS_ID with the number you got in the previous step):

kill -3 PROCESS_ID

Or the more readable, but longer, version of the same command:

kill -s SIGQUIT PROCESS_ID

The output is sent to the default system output of the process. In the case of Alfresco, that would be <Alfresco Installation>/tomcat/logs/catalina.out. Let’s take a look:

tail -n 500 /opt/alf5.2.7/tomcat/logs/catalina.out

The dump lists the Java stack trace of every thread. This is a sample with the two first threads of the dump:

2020-10-15 13:07:47
Full thread dump OpenJDK 64-Bit Server VM (25.252-b09 mixed mode):

"MultiThreadedHttpConnectionManager cleanup" #193 daemon prio=5 os_prio=0 tid=0x00007f41e0163800 nid=0x2c8 in Object.wait() [0x00007f4141d2f000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
        - locked <0x00000000ee786010> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
        at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)

"http-bio-8443-AsyncTimeout" #191 daemon prio=5 os_prio=0 tid=0x00007f425c2fa800 nid=0x2c7 waiting on condition [0x00007f414243f000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:152)
        at java.lang.Thread.run(Thread.java:748)
Watch it! Analyzing thread dumps requires an understanding of Java, servlets, and the application that you are reviewing. If you don’t have those skills, you can still gather this information and hand it over to a friendly colleague or someone at Zia Consulting.

 

Step 2b: Generate multiple thread dumps

As I explained above, multiple snapshots—thread dumps—will help you to see if there are any stuck threads. Although you can manually get each snapshot, it is easier and more accurate to use a script like the following:

 

#!/bin/sh
# Generate N thread dumps of the process PID with an INTERVAL between each dump.
# 


if [ $# -ne 3 ]; then
   echo Generates Java thread dumps using the SIGQUIT signal.
   echo Output is sent to the application's system output.   
   echo
   echo usage: $0 process_id repetitions interval
   echo
   echo EXAMPLE
   echo    $0 1234 5 8s
   echo        Generate a thread dump every 8 seconds, five times, 
   echo        of the process 1234.
   exit 1
fi

PID=$1
N=$2
INTERVAL=$3

for ((i=1; i<=$N; i++))
do
   echo $i of $N
   kill -s SIGQUIT $PID 
   sleep $INTERVAL
done

Here’s how to run the script to generate five threads, one second apart each, using the process ID from Step 1:

./dump-threads.sh 470 5 1s
Potential Yikes! Did you note the “s” in “1s”? That of course means 1 second. The time unit is optional, and “seconds” is the default, but you can use other units.

 

Step 3: Extract the dumps

The biggest inconvenience of this technique is that you have to extract the thread dumps from the logs. Although you can use an editor to open the log file and copy-and-paste the dumps to separate files, you could use a script to extract them.

Step 4: Analyze!

Now that we have the dumps, what do we do with them? You can analyze each dump separately:

However, to analyze the changes across multiple dumps we need a TDA (Thread Dump Analyzer). Some useful examples are:

  • Samurai (source code)
  • The online tool fastThread is a visually attractive solution, but I have not found it very useful for Java web applications such as Alfresco, because many threads appear as RUNNABLE, although they are idle, just waiting for data.

Conclusions

kill -3’ and jstack are free, easy-to-use, easy-to-implement options that can be used to help diagnose some performance problems, and it becomes even more powerful when multiple dumps are generated.

If you are experiencing performance problems, consider using our expert services. We will help you to find the performance bottlenecks, and offer recommendations to address present and future problems.

References

Pin It on Pinterest

Sharing is caring

Share this post with your friends!