Such a Nested Monitor Lockout

by perfstories

There is a very nice tutorial: http://tutorials.jenkov.com/java-concurrency/index.html.

I like it because there are pretty clear descriptions of lockouts. Each of lockouts has its own name (Nested Monitor Lockout, Deadlock, Reentrance Lockout). Before I read this article, I called the all situations with hanging endlessly threads very simple – “Deadlock”.    🙂

There are also methods to prevent deadlock. I remember that hadn’t answered this question at one of my interview (fortunately, there was no impact on result).

And it was very funny to meet Nested Monitor Lockout (http://tutorials.jenkov.com/java-concurrency/nested-monitor-lockout.html) in practice in a couple of days.

A symptom was very simple. Suddenly a thread pool of http request handlers became exhausted. So this server was not able to handle http requests further.

One of threads was hanging on wait, and it isn’t so bad per se, as this is wait to wait;). The problem is this thread didn’t release the lock on storage (see below).

public Item allocateItem(Key key, Item item) {
    …
    synchronized (storage ) { <= will not release this lock when go to wait 
        …
        synchronized (allocatingItem ) { //<=  and this lock will release 
    //when go to wait (because this is the same allocatingItem)
        while (allocatingItem.isAllocated()) {
            try {
                 allocatingItem.wait();   <= Thread hangs here
             } catch (InterruptedException e) {
                …
             }
     }
    …   
}

And we hang at wait because, accordingly to code, to call notifyAll, some of threads should occupy lock on storage. But no one could occupy lock on storage as thread keeping lock on storage is waiting notifyAll or notify, but to call notifyAll or notify some of threads should occupy lock on storage. But no one could occupy lock on storage…

public void deallocateItem(Item item) {
    …
    synchronized (storage) {
        synchronized (item) {
            …
             item.notifyAll();
            …
         }
    }
    …
}

Diagnostics was very simple, as I had thread stacks and I could track how threads became hanging. So it was obviously what line exactly any thread hangs at. So I just needed to look at code and thank developer for so obvious situation.

So to clarify regarding thread stacks: when server was not available (pool was exhausted, see above), all threads (except one, keeping lock on storage) was hanging waiting lock on storage. And threads were in this state till server restart.

Dynamics is also pretty obvious: when one thread called wait, another is trying to occupy lock on storage. So we have 2 hanging threads. So, further is a question of time. Any thread, which is trying to occupy lock on storage, will hang. So what is the time to exhaust the whole pool? It depends on system load, I mean, how often this scenario is called, in particular. In my case, that was about 5 hours. I saw it clearly by thread stacks (time went from 2 hangings threads till all hanging threads).

Yeah, of course in test environments we could reproduce that more quickly, I guess in seconds rather than hours. Unfortunately, we missed this issue at testing stage as case was pretty rare, and it was out of our load model

Advertisements