I recently set up a new system running most of the software my business produces. Unlike our other sites right from the beginning all of Java servers (a mixture of Tomcat and GlassFish) will be access via Apache using Apache Tomcat Connector mod_jk. Typically we start of will one server and add Apache in later but this is a service move so it makes sense to start with Apache in place.
Through some careful planning and a big dose of hard work the move has gone smoothly right up to Saturday morning (of course it had to be a weekend) when the wheels came off one of our processes. One of our applications running on GlassFish exposes data via a web service to another one of our applications which runs on the Desktop. This process has worked flawlessly from the start which was at least 18 months ago but all of a suddenly this weekend it decides to break.
The application that consuming the data starting complaining that the XML that was being returned by the server was invalid (a closing tag was missing). To confuse the picutre further some queries worked flawlessly while others always failed. The failures were completely repeatable though which was at least a small blessing.
As the web service was under GlassFish which I remembered has a web service testing facility I logged into the admin console and tried testing the service… It worked flawlessly. I was both pleased and slightly annoyed that it worked. Pleased because it looked like the software was probably correct, annoyed because it meant the problem was somewhere else.
One of the problems with web services is they can be a bit of a pig to test. The IDE and Maven get together to handle the production of most of the classes that are required to access the web service by reading the WSDL file. This is good in so much as your interaction with the web service is much like interacting with any other method but bad because when things go wrong you just have an impenetrable black box to work with. Since I was getting messages about invalid XML I really wanted to see the XML returned by GlassFish but trying to get at it through our application was a non-starter.
I decided to get really primative on it and attack it with Ethereal. Turned out this was a dead loss, I’d forgotten I was connecting over a VPN so everything was encrypted and I didn’t fancy dragging down the server to my local machine at the weekend. The output from Ethereal was too low level anyway, I wanted to see just the HTTP response.
After a bit of digging I came across SoapUI which I initially dismissed because it looked like a pay-for only tool. Turns out they have a feature rich free version which would do what I wanted. SoapUI is a tool dedicated to testing web services, it can read a WSDL, generate requests and show you the response. I pointed it at out WSDL and within a few moments I had some requests ready to run. I banged in the parameters for one of the calls that wasn’t working and… nothing. There was no response from the server. I gave up and went to bed.
Thinking that SoapUI was a pile of the proverbial but having few other choices left I decided to give it another shot on a much simpler request. Guess what? It worked. I then tried it on the one of the blocks of data I knew downloaded correctly in our application and that worked too. Switching back to the block causing trouble and, you guessed it, no response.
It was then that I remembered that I’d been seeing a lot of stack traces in GlassFish about abnormally closed connections and header serialization issues. It suddenly dawned on me that in the past we have never tried to run a web service over Apache and mod_jk like this. I quickly opened the relevant port on the server firewall to let me test directly against GlassFish and would you believe it everything started working perfectly.
So, I have a solution but clearly there is a problem somewhere in the Apache ↔ mod_jk ↔ GlassFish area and I don’t know where. I know that mod_jk wasn’t specifically written for GlassFish but it’s widely used so I suspect this is a more of a configuration issue than an out and out bug. The data I’m returning is of a fair size at about 500kB but hardly breaking the bank but I wonder if this is causing the problem. I’m slightly suspicious of Apache as that is my weakest area in terms of knowledge and I wonder if one of the other modules is causing the problem. I will update you if I ever find out what was causing the issues.