So this problem has been driving me completely bonkers.
CF10 update 18, RedHat 6.7 64 bit. Java 1.7.093
We noticed a similar problem with coldfusion 10 update 14 last year, and rolled back to update 13. Recently we pushed out update 18, which fixed some issues, but did not fix CF restarting into a hung or dead state.
I have gone as far as uninstalling CF10 completely, reinstalling a fresh CF10, doing the mandatory update, and then doing update 18. Creating two fresh new instances, a new cluster, and configuring apache connector to use said cluster. Its about as out of the box as I can make it.
With this physical box quiesced in my hardware load balancer (not taking requests) I can happily restart the two CF instances all day long without much problem (maybe the occasional hang, but rare). The trouble starts when I put the box back into production and allow it to start taking requests, and then restart a CF instance. At least one CF instance will always come up in a hung or dead state. I cannot get to tomcat under port 8501, and mod_jk shows the worker as down. The coldfusion_out log shows "PM Information [localhost-startStop-2] - ColdFusion stopped" as the last entry. Running a ps aux shows that the instance is indeed running. Issuing a kill -3 <pid> does not log anything to std err or to coldfusion-out, or anywhere else for that matter. When I try to stop or restart the instance again, I get the following message, and again, the instance starts in a hung state
[root]# /opt/coldfusion10/web4cf1/bin/coldfusion restart
Restarting ColdFusion 10 server instance named web4cf1 ...
Stopping ColdFusion 10 server instance named web4cf1, please wait
Mar 22, 2016 2:24:09 PM com.adobe.coldfusion.launcher.Launcher stopServer
SEVERE: Shutdown Port 8008is not active. Stop the server only after it is started.
ColdFusion 10 server instance named web4cf1 has been stopped
Starting ColdFusion 10 server instance named web4cf1 ...
The ColdFusion 10 server instance named web4cf1 is starting up and will be available shortly.
nohup: appending output to `nohup.out'
======================================================================
ColdFusion 10 server instance named web4cf1 has been started.
ColdFusion 10 will write logs to /opt/coldfusion10/web4cf1/logs/coldfusion-out.log
======================================================================
I have tried various timeouts and settings in the server.xml files as well as workers.properties with no luck. Here are my current files:
worker.list=web4cluster,web4cf1,web4cf2
worker.web4cluster.type=lb
worker.web4cluster.balance_workers=web4cf1,web4cf2
worker.web4cluster.sticky_session=true
worker.web4cf1.type=ajp13
worker.web4cf1.host=localhost
worker.web4cf1.port=8041
worker.web4cf1.connect_timeout=250
worker.web4cf1.max_reuse_connections=250
worker.web4cf1.connection_pool_size = 400
worker.web4cf1.connection_pool_minsize= 200
worker.web4cf1.connection_pool_timeout = 600
worker.web4cf1.lbfactor=1
worker.web4cf1.route=web4cf1
worker.web4cf2.type=ajp13
worker.web4cf2.host=localhost
worker.web4cf2.port=8042
worker.web4cf2.connect_timeout=250
worker.web4cf2.max_reuse_connections=250
worker.web4cf2.connection_pool_size = 400
worker.web4cf2.connection_pool_minsize= 200
worker.web4cf2.connection_pool_timeout = 600
worker.web4cf2.lbfactor=1
worker.web4cf2.route=web4cf2
<Server port="8008" shutdown="SHUTDOWN">
<Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on">
</Listener>
<Listener className="org.apache.catalina.core.JasperListener">
</Listener>
<Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener">
</Listener>
<Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener">
</Listener>
<GlobalNamingResources>
<Resource description="User database that can be updated and saved" name="UserDatabase" pathname="conf/tomcat-users.xml" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" type="org.apache.catalina.UserDatabase" auth="Container">
</Resource>
</GlobalNamingResources>
<Service name="Catalina">
<Executor name="tomcatThreadPool" minSpareThreads="4" maxThreads="150" namePrefix="catalina-exec-">
</Executor>
<Connector port="8501" protocol="org.apache.coyote.http11.Http11Protocol" connectionTimeout="2000" redirectPort="8446" executor="tomcatThreadPool" maxThreads="50">
</Connector>
<Connector port="8041" protocol="AJP/1.3" redirectPort="8446" tomcatAuthentication="false">
</Connector>
<Engine jvmRoute="web4cf1" name="Catalina" defaultHost="localhost">
<Realm className="org.apache.catalina.realm.LockOutRealm">
<Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase">
</Realm>
</Realm>
<Host name="localhost" autoDeploy="false" unpackWARs="true" appBase="webapps">
<Valve pattern="%h %l %u %t "%r" %s %b" directory="logs" prefix="localhost_access_log." className="org.apache.catalina.valves.AccessLogValve" suffix=".txt" resolveHosts="false">
</Valve>
</Host>
<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="8">
<Manager notifyListenersOnReplication="true" expireSessionsOnShutdown="false" className="org.apache.catalina.ha.session.DeltaManager">
</Manager>
<Channel className="org.apache.catalina.tribes.group.GroupChannel">
<Membership port="45564" dropTime="3000" address="228.0.240.104" className="org.apache.catalina.tribes.membership.McastService" frequency="500">
</Membership>
<Receiver port="4001" autoBind="100" address="auto" selectorTimeout="2000" maxThreads="6" className="org.apache.catalina.tribes.transport.nio.NioReceiver">
</Receiver>
<Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
<Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender">
</Transport>
</Sender>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector">
</Interceptor>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor">
</Interceptor>
</Channel>
<Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="">
</Valve>
<Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve">
</Valve>
<ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener">
</ClusterListener>
<ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener">
</ClusterListener>
</Cluster>
</Engine>
</Service>
</Server>
Now if I stop apache, or if I remove the load by removing the physical box from the hardware load balancer, the coldfusion instance will happily restart (after the port 8008 not active error).
If I start the coldfusion instance manually from the command line, I do get some error messages when it fails:
[root]# /opt/coldfusion10/jre/bin/java -classpath /opt/coldfusion10/web4cf2/runtime/bin/tomcat-juli.jar:/opt/coldfusion10/web4cf2/bin/cf-bo otstrap.jar:/opt/coldfusion10/web4cf2/lib/oosdk/lib:/opt/coldfusion10/web4cf2/lib/oosdk/li b/*:/opt/coldfusion10/web4cf2/lib/oosdk/classes:/opt/coldfusion10/web4cf2/lib/oosdk/classe s/*: -server -Djava.awt.headless=true -Xms512m -Xmx2048m -XX:MaxPermSize=256m -XX:+UseParallelGC -Xbatch -Dcoldfusion.home=/opt/coldfusion10/web4cf2 -Djava.security.egd=file:/dev/./urandom -Dcoldfusion.rootDir=/opt/coldfusion10/web4cf2 -Dcoldfusion.libPath=/opt/coldfusion10/web4cf2/lib -Dorg.apache.coyote.USE_CUSTOM_STATUS_MSG_IN_HEADER=true -Dcoldfusion.jsafe.defaultalgo=FIPS186Random -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2 -Dcoldfusion.classPath=/opt/coldfusion10/web4cf2/lib/updates,/opt/coldfusion10/web4cf2/li b,/opt/coldfusion10/web4cf2/lib/axis2,/opt/coldfusion10/web4cf2/gateway/lib/,/opt/coldfusi on10/web4cf2/wwwroot/WEB-INF/flex/jars,/opt/coldfusion10/web4cf2/wwwroot/WEB-INF/cfform/ja rs,/var/www/html/topaz/SigPlus2_29.jar,/var/www/html/comm.jar com.adobe.coldfusion.bootstrap.Bootstrap -start
Mar 22, 2016 3:45:36 PM org.apache.catalina.core.AprLifecycleListener lifecycleEvent
INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /opt/coldfusion10/jre/lib/amd64/server:/opt/coldfusion10/jre/lib/amd64:/opt/coldfusion10/ jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
Mar 22, 2016 3:45:36 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8502"]
Mar 22, 2016 3:45:36 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["ajp-bio-8042"]
Mar 22, 2016 3:45:36 PM org.apache.catalina.core.StandardService startInternal
INFO: Starting service Catalina
Mar 22, 2016 3:45:36 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.64
Mar 22, 2016 3:45:36 PM org.apache.catalina.ha.tcp.SimpleTcpCluster startInternal
INFO: Cluster is about to start
Mar 22, 2016 3:45:36 PM org.apache.catalina.tribes.transport.ReceiverBase bind
INFO: Receiver Server Socket bound to:/10.10.240.104:4002
Mar 22, 2016 3:45:36 PM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
INFO: Setting cluster mcast soTimeout to 500
Mar 22, 2016 3:45:36 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4
Mar 22, 2016 3:45:36 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=3337299, securePort=-1, UDP Port=-1, id={-65 -3 47 75 8 -4 74 2 -123 -42 -41 -22 84 88 60 82 }, payload={}, command={}, domain={}, ]
Mar 22, 2016 3:45:37 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:4
Mar 22, 2016 3:45:37 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:8
Mar 22, 2016 3:45:37 PM org.apache.catalina.tribes.io.BufferPool getBufferPool
INFO: Created a buffer pool with max size:104857600 bytes of type:org.apache.catalina.tribes.io.BufferPool15Impl
Mar 22, 2016 3:45:38 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:38 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:38 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:38 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:8
Mar 22, 2016 3:45:38 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:39 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#
Mar 22, 2016 3:45:40 PM org.apache.catalina.startup.TldConfig execute
INFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
Mar 22, 2016 3:45:41 PM org.apache.catalina.session.StandardSession tellNew
SEVERE: Session event listener threw exception
java.lang.NullPointerException
at coldfusion.bootstrap.HttpFlexSessionBootstrap.getListener(HttpFlexSessionBootstrap.java:1 54)
at coldfusion.bootstrap.HttpFlexSessionBootstrap.sessionCreated(HttpFlexSessionBootstrap.jav a:69)
at org.apache.catalina.session.StandardSession.tellNew(StandardSession.java:422)
at org.apache.catalina.session.StandardSession.setId(StandardSession.java:394)
at org.apache.catalina.ha.session.DeltaSession.setId(DeltaSession.java:275)
at org.apache.catalina.ha.session.DeltaManager.handleSESSION_CREATED(DeltaManager.java:1317)
at org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1195)
at org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:944)
at org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListe ner.java:91)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:936)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:917)
at org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:278)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelIntercepto rBase.java:82)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailu reDetector.java:117)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelIntercepto rBase.java:82)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelIntercepto rBase.java:82)
at org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.ja va:252)
at org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:2 87)
at org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTa sk.java:210)
at org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:9 9)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Mar 22, 2016 3:45:41 PM org.apache.catalina.session.StandardSession tellNew
SEVERE: Session event listener threw exception
java.lang.NullPointerException
at coldfusion.bootstrap.HttpFlexSessionBootstrap.getListener(HttpFlexSessionBootstrap.java:1 54)
at coldfusion.bootstrap.HttpFlexSessionBootstrap.sessionCreated(HttpFlexSessionBootstrap.jav a:69)
at org.apache.catalina.session.StandardSession.tellNew(StandardSession.java:422)
at org.apache.catalina.session.StandardSession.setId(StandardSession.java:394)
at org.apache.catalina.ha.session.DeltaSession.setId(DeltaSession.java:275)
at org.apache.catalina.ha.session.DeltaManager.handleSESSION_CREATED(DeltaManager.java:1317)
at org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1195)
at org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:944)
at org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListe ner.java:91)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:936)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:917)
at org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:278)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelIntercepto rBase.java:82)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailu reDetector.java:117)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelIntercepto rBase.java:82)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelIntercepto rBase.java:82)
at org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.ja va:252)
at org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:2 87)
at org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTa sk.java:210)
at org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:9 9)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Mar 22, 2016 3:45:41 PM org.apache.catalina.ha.session.DeltaManager startInternal
INFO: Register manager localhost# to cluster element Engine with name Catalina
Mar 22, 2016 3:45:41 PM org.apache.catalina.ha.session.DeltaManager startInternal
INFO: Starting clustering manager at localhost#
Mar 22, 2016 3:45:41 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
INFO: Manager [localhost#], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=3341800, securePort=-1, UDP Port=-1, id={-65 -3 47 75 8 -4 74 2 -123 -42 -41 -22 84 88 60 82 }, payload={}, command={}, domain={}, ]. This operation will timeout if no session state has been received within 60 seconds.
Mar 22, 2016 3:45:47 PM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions
INFO: Manager [localhost#]; session state send at 3/22/16 3:45 PM received in 5,614 ms.
Does anyone have any suggestions on what I could try?