performance stories

technical essays about performance, life and everything ;)

Category: story

Yet another story about high cpu utilization…

I’m going to tell you one more story about high cpu utilization.

First, I’ll tell you the background information, how the problem was localized. Then we will think how HashMap can prove high cpu utilization.

Shall we start?

A real life example of finding memory leaks in java

Theory of finding memory leaks is pretty simple. I would formulate it as follows:

  1. Make sure it is a leak indeed;
  2. Identify leaking objects;
  3. Identify code lines, which prove a memory leak;
  4. Fix code.

There is a lot of articles about that. I remember a really great article: http://olex.openlogic.com/wazi/2009/how-to-fix-memory-leaks-in-java. I recommend it to everyone and thank it’s author.

If you got memory leak while doing performance testing, you are lucky. Usually, it’s very simple. Even if you use J2EE technologies, I mean, say, 80% of code, which is running on JRE, isn’t your team code (I mean container, various libraries, etc). I said simple, as you can reproduce it. Of course, there could be more difficult cases (JDK native code, lack of sources, etc.). But anyway, opportunity to reproduce is a serious bid for victory.

But what shall we do if there are no memory leaks while testing at all, in contrast to production environment with real user activities?

Why does high cpu utilization happen sometimes?

I’m going to tell you a story about one more emergency in our production environment. More exactly, cpu utilization raised up to 90% in some days, while it was about 20% in usual days.

Of course, the first thing to do is check: whether there is some extra load or not. I mean, if number of requests sent to this server increased, it could explain high cpu utilization. I verified that and observed that server had no extra tasks.

Where do those 70% = 90% – 20% go from? Server had no extra tasks. What was it doing?

In general, it’s a pretty trivial task in test environment, when you do know a way of reproducing. We could just take our profiler and go ahead.

The difficulty in my case consisted of (a) profiler in production environment, perhaps, isn’t the best idea; (b) there is no known way of reproducing. It happens randomly and I have no ideas how to prove it.

Yes, but we could gather thread dumps. So I had by-the-minute thread dumps. 24*60=1400 files with thread dumps, 600 thread stacks per file approximately. That’ll be about 1400*600=840 000 thread stacks per day. Well, at least there will be time to think while parsing is going on. 🙂

Ok, let’s start…

A couple of thoughts about Hibernate, caches and OQL

I was needed to clarify several things on Hibernate according to my work (what is a query cache? does it have regions? what is update timestamps cache? does it have regions or not?)

In general, there is a lot of articles, papers, notes, manuals, etc. One of the excellent examples is http://tech.puredanger.com/2009/07/10/hibernate-query-cache/.

So we got some understanding on query cache and update timestamps cache after reading: there is a query cache with query and bound variables as keys and update timestamps cache, that keeps a record corresponding to each table (was it modified or not later than query result was cached).

ok, I decided to see that with my own eyes

ThreadLocal. Proceed with caution!

…Especially, when you have large-size objects and a lot of class loaders.

Why don’t I joke 😉

Yet another hanging java.net.SocketInputStream.socketRead0…

I’m going to tell you about one more emergency in our production environment.

So, we have a server with Glassfish, with 2 jdbc Oracle connection pools. One of these pools was usable, another wasn’t. Just was kept as a reminder of something ;). All external requests to this server were remote jdbc calls.

The symptoms of this emergency were:

  1. It was impossible to ping usable jdbc pool (GF console -> Resources -> JDBC -> Connection Pools -> General -> ping). I mean ping request was just hanging;
  2. Oracle DBAs saw no problems with sessions, connections, etc.;
  3. Unusable pool could be pinged easily;
  4. The servers, which were using this server, hung up in a special way: their http request handlers pools exhausted gradually, then the servers appeared hanging for some time, then were in working state again, then again from the beginning (exhausted, hung, working).

I had thread dumps as usual

Such a Nested Monitor Lockout

There is a very nice tutorial: http://tutorials.jenkov.com/java-concurrency/index.html.

I like it because there are pretty clear descriptions of lockouts. Each of lockouts has its own name (Nested Monitor Lockout, Deadlock, Reentrance Lockout). Before I read this article, I called the all situations with hanging endlessly threads very simple – “Deadlock”.    🙂

There are also methods to prevent deadlock. I remember that hadn’t answered this question at one of my interview (fortunately, there was no impact on result).

And it was very funny to meet Nested Monitor Lockout (http://tutorials.jenkov.com/java-concurrency/nested-monitor-lockout.html) in practice in a couple of days.

More exactly, that was on production environment.

%d bloggers like this: