performance stories

technical essays about performance, life and everything ;)

Month: August, 2012

Be aware of SUN Oracle Networking Properties or how to prevent hanging java.net.SocketInputStream.socketRead0……

There were a lot of emergencies in my practice with a very simple cause: one of the threads hung. And the single hanging thread isn’t so dangerous per se.

However

Advertisements

A real life example of finding memory leaks in java

Theory of finding memory leaks is pretty simple. I would formulate it as follows:

  1. Make sure it is a leak indeed;
  2. Identify leaking objects;
  3. Identify code lines, which prove a memory leak;
  4. Fix code.

There is a lot of articles about that. I remember a really great article: http://olex.openlogic.com/wazi/2009/how-to-fix-memory-leaks-in-java. I recommend it to everyone and thank it’s author.

If you got memory leak while doing performance testing, you are lucky. Usually, it’s very simple. Even if you use J2EE technologies, I mean, say, 80% of code, which is running on JRE, isn’t your team code (I mean container, various libraries, etc). I said simple, as you can reproduce it. Of course, there could be more difficult cases (JDK native code, lack of sources, etc.). But anyway, opportunity to reproduce is a serious bid for victory.

But what shall we do if there are no memory leaks while testing at all, in contrast to production environment with real user activities?

Why does high cpu utilization happen sometimes?

I’m going to tell you a story about one more emergency in our production environment. More exactly, cpu utilization raised up to 90% in some days, while it was about 20% in usual days.

Of course, the first thing to do is check: whether there is some extra load or not. I mean, if number of requests sent to this server increased, it could explain high cpu utilization. I verified that and observed that server had no extra tasks.

Where do those 70% = 90% – 20% go from? Server had no extra tasks. What was it doing?

In general, it’s a pretty trivial task in test environment, when you do know a way of reproducing. We could just take our profiler and go ahead.

The difficulty in my case consisted of (a) profiler in production environment, perhaps, isn’t the best idea; (b) there is no known way of reproducing. It happens randomly and I have no ideas how to prove it.

Yes, but we could gather thread dumps. So I had by-the-minute thread dumps. 24*60=1400 files with thread dumps, 600 thread stacks per file approximately. That’ll be about 1400*600=840 000 thread stacks per day. Well, at least there will be time to think while parsing is going on. 🙂

Ok, let’s start…

%d bloggers like this: