There is a lot of articles, presentations about Software Performance Engineering Methodology. Just google it using, say, “Software Performance Engineering” and have fun.
So I won’t repeat all the things the search engines can find for us ;). Let me just outline several points, which are the most important from my point of view.
If you read my previous posts you had a chance to note that each of them was devoted to a specific technical question or story. Moreover, each of these posts contains enough technical details to help you resolve a similar issue. And what is important to say I consciously avoided general and unspecific talking without examples. Hopefully, this saved your time as you can apply my solutions without repeating my steps to find them :).
However, just a technical advice doesn’t guarantee your task resolution.
There were a lot of emergencies in my practice with a very simple cause: one of the threads hung. And the single hanging thread isn’t so dangerous per se.
Theory of finding memory leaks is pretty simple. I would formulate it as follows:
There is a lot of articles about that. I remember a really great article: http://olex.openlogic.com/wazi/2009/how-to-fix-memory-leaks-in-java. I recommend it to everyone and thank it’s author.
If you got memory leak while doing performance testing, you are lucky. Usually, it’s very simple. Even if you use J2EE technologies, I mean, say, 80% of code, which is running on JRE, isn’t your team code (I mean container, various libraries, etc). I said simple, as you can reproduce it. Of course, there could be more difficult cases (JDK native code, lack of sources, etc.). But anyway, opportunity to reproduce is a serious bid for victory.
I’m going to tell you a story about one more emergency in our production environment. More exactly, cpu utilization raised up to 90% in some days, while it was about 20% in usual days.
Of course, the first thing to do is check: whether there is some extra load or not. I mean, if number of requests sent to this server increased, it could explain high cpu utilization. I verified that and observed that server had no extra tasks.
Where do those 70% = 90% – 20% go from? Server had no extra tasks. What was it doing?
In general, it’s a pretty trivial task in test environment, when you do know a way of reproducing. We could just take our profiler and go ahead.
The difficulty in my case consisted of (a) profiler in production environment, perhaps, isn’t the best idea; (b) there is no known way of reproducing. It happens randomly and I have no ideas how to prove it.
Yes, but we could gather thread dumps. So I had by-the-minute thread dumps. 24*60=1400 files with thread dumps, 600 thread stacks per file approximately. That’ll be about 1400*600=840 000 thread stacks per day. Well, at least there will be time to think while parsing is going on. 🙂
I was needed to clarify several things on Hibernate according to my work (what is a query cache? does it have regions? what is update timestamps cache? does it have regions or not?)
In general, there is a lot of articles, papers, notes, manuals, etc. One of the excellent examples is http://tech.puredanger.com/2009/07/10/hibernate-query-cache/.
So we got some understanding on query cache and update timestamps cache after reading: there is a query cache with query and bound variables as keys and update timestamps cache, that keeps a record corresponding to each table (was it modified or not later than query result was cached).
As you know, we need memory dump for a lot of things: memory leak analysis, memory footprint analysis (high memory footprint could be a problem sometimes) and to check/understand some details about our application.
However, in some circumstances heap dump generation can last tens minutes or even more. And all this time application is not available (yeah, we could set up a cluster, of course, but sometimes, it isn’t done). So that was my case, a server, which I had to get a memory dump from, was not clustered, heap size was about several gigabytes, jmap was used (jmap –F –dump:format=b…..), dump was stored to disk.
Next simple ideas helped me speed up dump generation more than 10 times: