Yet another hanging java.net.SocketInputStream.socketRead0…

by perfstories

I’m going to tell you about one more emergency in our production environment.

So, we have a server with Glassfish, with 2 jdbc Oracle connection pools. One of these pools was usable, another wasn’t. Just was kept as a reminder of something ;). All external requests to this server were remote jdbc calls.

The symptoms of this emergency were:

  1. It was impossible to ping usable jdbc pool (GF console -> Resources -> JDBC -> Connection Pools -> General -> ping). I mean ping request was just hanging;
  2. Oracle DBAs saw no problems with sessions, connections, etc.;
  3. Unusable pool could be pinged easily;
  4. The servers, which were using this server, hung up in a special way: their http request handlers pools exhausted gradually, then the servers appeared hanging for some time, then were in working state again, then again from the beginning (exhausted, hung, working).

I had thread dumps as usual, so I could easily restore what was going on inside Glassfish on the basic level:

  1. First of all, one of the threads started to hang while requesting Oracle (approximately, at emergency start time). I mean trace was the same till server restart (dozens of minutes). See a trace part important for my further story in [1];
  2. Other thread, which was pinging a pool (I don’t see from thread stacks whether it is the usable one or not), hung awaiting for lock, which was acquired by thread from (1), see trace in [2];
  3. Threads which were handling requests from other servers were hanging awaiting Connection Pool lock, which was acquired by thread from (2), see above. Something like [3].

To make sure, we could look into Glassfish to see all these synchronizations :). I did that and everything was confirmed.

Even more, we could reproduce the emergency symptoms by stopping one of the threads (in moment like [1]) and pinging the usable pool. I did that and reproduced the problem. So emergency eyewitnesses confirmed those are emergency symptoms exactly.

So, looks like the main problem is hanging thread (see [1], that’s some request to Oracle). Of course, there could be some issues while sending this request to Oracle like network instability (and it really was according to logs: java.net.SocketTimeoutException: connect timed out), etc. Even this is not so important. The important thing is that thread hangs without any timeout.

But it’s strange, as we had timeout:(GF console -> Resources -> JDBC -> Connection Pools -> Advanced -> Statement Timeout).

And if it’s strange let’s go to debug: we need to understand where this timeout is set in the code, and whether or not it is set on our ([1]) trace.

This timeout turned out to be query timeout, as I understood after debug (oracle.jdbc.driver.OracleStatement, queryTimeout). So it was set for user queries indeed. BUT! It wasn’t set for Oracle connection validation!

It’s obvious if you look at com.sun.gjc.spi.ManagedConnectionFactory.java:

protected void isValidByTableQuery(java.sql.Connection con, 
                                   String tableName) 
                            throws ResourceException {
    …
    try {
        stmt = con.prepareStatement("SELECT COUNT(*) FROM " 
                                    + tableName);
        rs = stmt.executeQuery();
     } catch (Exception sqle) {
    …
}

I.e. you see no stmt.setQueryTimeout.

By the way, I looked at Glassfish v3 (http://java.net/projects/glassfish/sources/svn/content/trunk/main/appserver/jdbc/jdbc-ra/jdbc-core/src/main/java/com/sun/gjc/spi/ManagedConnectionFactory.java?rev=53670), no timeout as well, so my story could be useful to someone.

Okey, so there is no query timeout for connection validation. But will this timeout resolve our problem with hanging socketRead0?
This question turned out to be the most difficult to me in this story:)

As far as I understood, query timeout works that way: There is a very special thread (“OracleTimeoutPollingThread”), which is almost always sleeping, but it wakes up sometimes to run through a map (Statement->timeout). So, if it’s time, this thread will call Statement.cancel() (http://docs.oracle.com/javase/6/docs/api/java/sql/Statement.html#cancel() – “Cancels this Statement object if both the DBMS and driver support aborting an SQL statement. This method can be used by one thread to cancel a statement that is being executed by another thread.”).

I understand it as follows: If the driver knows about this method, it will think out something to handle this. But no guarantee at all. And also, we know nothing about connection state, I mean if it is really possible to cancel or not.

So, to summarize, query timeout is very good (I wrote a patch for Glassfish, to set it for connection validation), but, perhaps, besides that, we need to tell something to socket, especially, as it has a special method for this!:
(java.net.Socket.setSoTimeout(int): “….With this option set to a non-zero timeout, a read() call on the InputStream associated with this Socket will block for only this amount of time. If the timeout expires, a java.net.SocketTimeoutException is raised, though the Socket is still valid.”)

Fortunately, we can tell to Oracle JDBC driver to set this timeout (-Doracle.jdbc.ReadTimeout=65432, in milliseconds).

By the way, it proves indirectly, that query timeout isn’t enough… As far as I understand, the logic around statement cancellation is pretty difficult in Oracle driver. More code and logic, more bugs. Plus we know nothing about connection state…

That’s all!

Take care of your network to avoid this kind of problems! :) Joke. Lack of timeout is terrible of course.

[1]

 
Thread "p: thread-pool-1; w: 2" thread-id 278 thread-stateRUNNABLERunning in native
         at: java.net.SocketInputStream.socketRead0(Native Method)
         at: java.net.SocketInputStream.read(SocketInputStream.java:129)
         at: oracle.net.ns.Packet.receive(Packet.java:240)
         at: oracle.net.ns.DataPacket.receive(DataPacket.java:92)
         at: oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:172)
         at: oracle.net.ns.NetInputStream.read(NetInputStream.java:117)
         at: oracle.net.ns.NetInputStream.read(NetInputStream.java:92)
         at: oracle.net.ns.NetInputStream.read(NetInputStream.java:77)
         at: oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1034)
         at: oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1010)
         at: oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:588)
         at: oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:194)
         at: oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:953)
         at: oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:897)
         at: oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1186)
         at: oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3387)
         at: oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3431)
         at: oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491)
         at: com.sun.gjc.spi.ManagedConnectionFactory.isValidByTableQuery(ManagedConnectionFactory.java:383)
         at: com.sun.gjc.spi.ManagedConnectionFactory.isValid(ManagedConnectionFactory.java:297)
         at: com.sun.gjc.spi.ManagedConnectionFactory.getInvalidConnections(ManagedConnectionFactory.java:246)
         at: com.sun.enterprise.resource.AbstractConnectorAllocator.getInvalidConnections(AbstractConnectorAllocator.java:99)
         at: com.sun.enterprise.resource.AbstractConnectorAllocator.isConnectionValid(AbstractConnectorAllocator.java:111)
         at: com.sun.enterprise.resource.AbstractResourcePool.isConnectionValid(AbstractResourcePool.java:719)
         at: com.sun.enterprise.resource.AbstractResourcePool.getUnenlistedResource(AbstractResourcePool.java:665)
         at: com.sun.enterprise.resource.AbstractResourcePool.internalGetResource(AbstractResourcePool.java:606)
         at: com.sun.enterprise.resource.AbstractResourcePool.getResource(AbstractResourcePool.java:455)
         at: com.sun.enterprise.resource.PoolManagerImpl.getResourceFromPool(PoolManagerImpl.java:248)
         …
         at: com.sun.corba.ee.impl.orbutil.threadpool.ThreadPoolImpl$WorkerThread.run(ThreadPoolImpl.java:555)

[2]

 
Thread "httpWorkerThread-4848-2" thread-id 91 thread-stateBLOCKEDWaiting on lock: oracle.jdbc.driver.T4CConnection@24027d6d
         Owned by: p: thread-pool-1; w: 2 Id: 278        at: oracle.jdbc.driver.PhysicalConnection.close(PhysicalConnection.java:3484)
         at: oracle.jdbc.pool.OraclePooledConnection.close(OraclePooledConnection.java:245)
         at: com.sun.gjc.spi.ManagedConnection.destroy(ManagedConnection.java:276)
         at: com.sun.enterprise.resource.LocalTxConnectorAllocator.destroyResource(LocalTxConnectorAllocator.java:115)
         at: com.sun.enterprise.resource.AbstractResourcePool.destroyResource(AbstractResourcePool.java:983)
         at: com.sun.enterprise.resource.AbstractResourcePool.emptyPool(AbstractResourcePool.java:1585)
         at: com.sun.enterprise.resource.PoolManagerImpl.killPool(PoolManagerImpl.java:560)
         at: com.sun.enterprise.connectors.ConnectorConnectionPoolAdminServiceImpl.killPool(ConnectorConnectionPoolAdminServiceImpl.java:1145)
         at: com.sun.enterprise.connectors.ConnectorConnectionPoolAdminServiceImpl.unloadAndKillPool(ConnectorConnectionPoolAdminServiceImpl.java:493)
         at: com.sun.enterprise.connectors.ConnectorConnectionPoolAdminServiceImpl.getUnpooledConnection(ConnectorConnectionPoolAdminServiceImpl.java:1219)
         at: com.sun.enterprise.connectors.ConnectorConnectionPoolAdminServiceImpl.testConnectionPool(ConnectorConnectionPoolAdminServiceImpl.java:551)
         at: com.sun.enterprise.connectors.ConnectorRuntime.testConnectionPool(ConnectorRuntime.java:520)
         at: com.sun.enterprise.admin.mbeans.ResourcesMBean.pingConnectionPool(ResourcesMBean.java:2387)
         at: sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at: sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at: sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at: java.lang.reflect.Method.invoke(Method.java:597)
         at: com.sun.enterprise.admin.MBeanHelper.invokeOperationInBean(MBeanHelper.java:381)
         at: com.sun.enterprise.admin.MBeanHelper.invokeOperationInBean(MBeanHelper.java:364)
         at: com.sun.enterprise.admin.config.BaseConfigMBean.invoke(BaseConfigMBean.java:477)
         at: com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
         at: com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
         at: sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
         at: sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at: java.lang.reflect.Method.invoke(Method.java:597)
         at: com.sun.enterprise.admin.util.proxy.ProxyClass.invoke(ProxyClass.java:90)
         at: $Proxy1.invoke(Unknown Source)
         at: com.sun.enterprise.admin.server.core.jmx.SunoneInterceptor.invoke(SunoneInterceptor.java:304)
         at: com.sun.enterprise.interceptor.DynamicInterceptor.invoke(DynamicInterceptor.java:174)
         at: com.sun.enterprise.admin.jmx.remote.server.callers.InvokeCaller.call(InvokeCaller.java:69)
         at: com.sun.enterprise.admin.jmx.remote.server.MBeanServerRequestHandler.handle(MBeanServerRequestHandler.java:155)
         at: com.sun.enterprise.admin.jmx.remote.server.servlet.RemoteJmxConnectorServlet.processRequest(RemoteJmxConnectorServlet.java:122)
         at: com.sun.enterprise.admin.jmx.remote.server.servlet.RemoteJmxConnectorServlet.doPost(RemoteJmxConnectorServlet.java:193)
         at: javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
         at: javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
         at: org.apache.catalina.core.ApplicationFilterChain.servletService(ApplicationFilterChain.java:427)
         at: org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:315)
         at: org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
         at: org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
         at: org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
         at: org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
         at: com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94)
         at: com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98)
         at: org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222)
         at: org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
         at: org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
         at: org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
         at: org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1096)
         at: org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166)
         at: org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
         at: org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
         at: org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
         at: org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1096)
         at: org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:288)
         at: com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:647)
         at: com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:579)
         at: com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:831)
         at: com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:341)
         at: com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:263)
         at: com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:214)
         at: com.sun.enterprise.web.connector.grizzly.TaskBase.run(TaskBase.java:265)
         at: com.sun.enterprise.web.connector.grizzly.WorkerThreadImpl.run(WorkerThreadImpl.java:116)

[3]

 
Thread "p: thread-pool-1; w: 1" thread-id 49 thread-stateBLOCKEDWaiting on lock: com.sun.enterprise.resource.SJSASResourcePool@420204e5
         Owned by: httpWorkerThread-4848-2 Id: 91        at: com.sun.enterprise.resource.AbstractResourcePool.getResourceFromPool(AbstractResourcePool.java:770)
         at: com.sun.enterprise.resource.AbstractResourcePool.getUnenlistedResource(AbstractResourcePool.java:664)
         at: com.sun.enterprise.resource.AbstractResourcePool.internalGetResource(AbstractResourcePool.java:606)
         at: com.sun.enterprise.resource.AbstractResourcePool.getResource(AbstractResourcePool.java:455)
         at: com.sun.enterprise.resource.PoolManagerImpl.getResourceFromPool(PoolManagerImpl.java:248)
         at: com.sun.enterprise.resource.PoolManagerImpl.getResource(PoolManagerImpl.java:176)
         at: com.sun.enterprise.connectors.ConnectionManagerImpl.internalGetConnection(ConnectionManagerImpl.java:327)
         at: com.sun.enterprise.connectors.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:189)
         at: com.sun.enterprise.connectors.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:165)
         at: com.sun.enterprise.connectors.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:158) 
         …
         at: com.sun.corba.ee.impl.protocol.CorbaServerRequestDispatcherImpl.dispatch(CorbaServerRequestDispatcherImpl.java:227)
         at: com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.handleRequestRequest(CorbaMessageMediatorImpl.java:1846)
         at: com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.handleRequest(CorbaMessageMediatorImpl.java:1706)
         at: com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.handleInput(CorbaMessageMediatorImpl.java:1088)
         at: com.sun.corba.ee.impl.protocol.giopmsgheaders.RequestMessage_1_2.callback(RequestMessage_1_2.java:223)
         at: com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.handleRequest(CorbaMessageMediatorImpl.java:806)
         at: com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.dispatch(CorbaMessageMediatorImpl.java:563)
         at: com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.doWork(CorbaMessageMediatorImpl.java:2567)
         at: com.sun.corba.ee.impl.orbutil.threadpool.ThreadPoolImpl$WorkerThread.run(ThreadPoolImpl.java:555)

About these ads