Recently I ran into issues where a web application running on Tomcat 6.0 was crashing due to load spikes in requests. What I noticed in the diagrams below, was there was no issue with CPU or Memory, but rather issues with GC cycles and high Thread count/spike causing the Tomcat instance to become unresponsive and not take anymore requests.
I will note, I noticed the Grails and Springframework was using about 5% to 10% more memory over my legacy JSP only application, but did not affect this testing as I was maxed at around 1.1GM of memory usage for 560 concurrent requests. The normal load will only be around 75 to 125 concurrent requests.
What I needed to do is get the JVM to perform better, and have Tomcat manage threads better.First, I wanted to focus on the JVM tuning to reduce Full GC’c and try to allow more shallow GC’s to occur.
Observations below, based on these settings:
One initial issue we have is we are dictated to run on Windows XP 32bit Tomcat 6.0 instances, thus, we are capped at 1.5GB memory on Hotspot JDK 1.6.
I noticed that the LARGE GC’s are running almost constantly, and because there is a slight application pause during full gc’s, this can add to degraded performance. Also, with the memory profile we did the other day, we have ~900K Objects created just from String concatenation and Webservice calls that are eligible for GC on EACH request chain. So we really need to get the GC to run more shallow cleans to avoid the expense of full gc’s.
Here shows 200 requests where you see many full GC’s. and maxing at around 5,8K GC cycles per 20ms sample
Here are the changes I made to get the results you see below:
- I removed the NewSize and MaxNewSize, and used NewRatio=2 which allows trhe JVM to use Ratio based Young generation gc management, and by using 2, this states the Y.G will only occupy < ¼ of the heap which is far lower than 256mb.
- I bumped the ParallelGCThreads 4 to 20.
- I added –Xincgc which going to add incremental low pause Collector, which collects tenured memory for each minor gc, and tries to minimize the pause for large collections.
- Set the SurvivorRatio from 8 down to 6 in order to not dump tenured collections too soon.
Here are the results with 400 threads (8 min view):
Here are the results with 400 threads (20 min view):
Then I decided to test out 600 threads (8 min view):
This 600 thread run, peaked at ~5.4K (20 min view):
Tomcat Thread Management.
The next issue I have is that the legacy webservice code spawns a new Thread for each request to make each call. This was an issue because the default connector was not cleaning up Threads and when we would get a Thread spike, it would kills the instance.
I started digging into this and came across this thread talking about using an <Executer> for managing thread: http://firstname.lastname@example.org/msg76038.html
I was finally able to reproduce the issue we had in PROD 08 last week. I had to increase the time delay of all DAS Webservice calls to Random 5 to 20 seconds delay (with AspectJ). Then I noticed that Tomcat was not releasing its worker threads. So, with 200 requests in a load test with JMeter, I was getting 220 to 240 threads instantiated, but once I stopped the JMeter Load process, I still had 190 worker threads lingering and would not kill them until I re-started Tomcat.
I have a setting of 75 maxSpareThreads, so I should not have gotten 190 Threads still running.
Existing server.xml setting:
<Connector port="8190" protocol="HTTP/1.1" maxHttpHeaderSize="8192" maxThreads="350" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8553" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" />
New server.xml setting using the Tomcat Thread Executor:
<Executor name="tomcatThreadPool" namePrefix="catalina-exec-8190-" maxThreads="350" minSpareThreads="75" /> <!-- A "Connector" using the shared thread pool--> <Connector executor="tomcatThreadPool" port="8190" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8553" />
I found that coaxing the GC into performing more shallow GC’c lessoned the load on the CPU, and slowed the erratic GC cycling in my application. I am sure I can tune more, but for now, my issues have been resolved to a manageable degree.