I’m going to tell you one more story about high cpu utilization.
First, I’ll tell you the background information, how the problem was localized. Then we will think how HashMap can prove high cpu utilization.
Theory of finding memory leaks is pretty simple. I would formulate it as follows:
There is a lot of articles about that. I remember a really great article: http://olex.openlogic.com/wazi/2009/how-to-fix-memory-leaks-in-java. I recommend it to everyone and thank it’s author.
If you got memory leak while doing performance testing, you are lucky. Usually, it’s very simple. Even if you use J2EE technologies, I mean, say, 80% of code, which is running on JRE, isn’t your team code (I mean container, various libraries, etc). I said simple, as you can reproduce it. Of course, there could be more difficult cases (JDK native code, lack of sources, etc.). But anyway, opportunity to reproduce is a serious bid for victory.
I’m going to tell you a story about one more emergency in our production environment. More exactly, cpu utilization raised up to 90% in some days, while it was about 20% in usual days.
Of course, the first thing to do is check: whether there is some extra load or not. I mean, if number of requests sent to this server increased, it could explain high cpu utilization. I verified that and observed that server had no extra tasks.
Where do those 70% = 90% – 20% go from? Server had no extra tasks. What was it doing?
In general, it’s a pretty trivial task in test environment, when you do know a way of reproducing. We could just take our profiler and go ahead.
The difficulty in my case consisted of (a) profiler in production environment, perhaps, isn’t the best idea; (b) there is no known way of reproducing. It happens randomly and I have no ideas how to prove it.
Yes, but we could gather thread dumps. So I had by-the-minute thread dumps. 24*60=1400 files with thread dumps, 600 thread stacks per file approximately. That’ll be about 1400*600=840 000 thread stacks per day. Well, at least there will be time to think while parsing is going on. 🙂
I was needed to clarify several things on Hibernate according to my work (what is a query cache? does it have regions? what is update timestamps cache? does it have regions or not?)
In general, there is a lot of articles, papers, notes, manuals, etc. One of the excellent examples is http://tech.puredanger.com/2009/07/10/hibernate-query-cache/.
So we got some understanding on query cache and update timestamps cache after reading: there is a query cache with query and bound variables as keys and update timestamps cache, that keeps a record corresponding to each table (was it modified or not later than query result was cached).
I’m going to tell you about one more emergency in our production environment.
So, we have a server with Glassfish, with 2 jdbc Oracle connection pools. One of these pools was usable, another wasn’t. Just was kept as a reminder of something ;). All external requests to this server were remote jdbc calls.
The symptoms of this emergency were:
There is a very nice tutorial: http://tutorials.jenkov.com/java-concurrency/index.html.
I like it because there are pretty clear descriptions of lockouts. Each of lockouts has its own name (Nested Monitor Lockout, Deadlock, Reentrance Lockout). Before I read this article, I called the all situations with hanging endlessly threads very simple – “Deadlock”. 🙂
There are also methods to prevent deadlock. I remember that hadn’t answered this question at one of my interview (fortunately, there was no impact on result).
And it was very funny to meet Nested Monitor Lockout (http://tutorials.jenkov.com/java-concurrency/nested-monitor-lockout.html) in practice in a couple of days.