The application was a standard ASP.NET application which used a Windows Forms client, hosting Internet Explorer. The problem was on the ASP.NET server side and manifested itself by the server completely locking up – to the point that the only way to recover was to power off the server. Thinking it might have been a hardware problem, the client deployed the application on a different server and was able to duplicate the issue. Wintellect also learned on the initial call that the problem only occurred in production and could not be duplicated in a test environment.
As Wintellect began scouring the source code, we asked the client to get a minidump of the ASP.NET process from the production server. A minidump is essentially a photograph of the process that can be mined in a debugger to determine the state of the application. Wintellect worked with the client’s IT department to support them in the creation of the minidumps. Coordinating with the customer, we set up open phones in the call center area, since the call center representatives “could tell when there were problems because everything got slow.” Our hope was to catch the minidump right after the call center reported slowdowns so that the dump file could be written just before the server locked up.
As the IT rep was sitting at the server waiting to be told to create the minidump, we also had them running Process Explorer, the free tool from www.sysinternals.com. With Process Explorer, we could obtain information performance counter data as well as Window handle usage for the ASP.NET worker process. We instructed the IT rep to the Process Explorer output every 30 seconds.
The application was turned on and calls were routed to the idle call center representatives. For the first 30 minutes or so, the application performed flawlessly. At approximately 45 minutes the call center reported the application was starting to slow down. We instructed the IT rep to create the dump. Right after the command to create the dump was issued, the server’s UI hung, however the disk usage light indicated disk activity. We decided to let the machine sit for a couple of hours in the hope that the dump would eventually get written. As we were waiting for the dump to finish we dove into the testing environment and worked hard at trying to create a scenario that would duplicate this bug. Remember, Wintellect did not have direct access to the testing environment but we were guiding the client over the phone. We had the QA department crank their automated tests to 10,000 users hoping to stress the ASP.NET application sufficiently. Wintellect still continued to read the source code, but that was proving a dead end: our customer had followed every best practice in the book and the code was beautiful.
After waiting two hours for the minidump to complete, we requested a reboot of the hung server and asked to see the state of the minidump. We got some seriously bad news. The minidump was zero bytes, which meant that it didn’t get written. Fortunately, the Process Explorer text files were saved until minute 46 so at least we had something to work with.
As soon as we opened up the mail with the Process Explorer files, we were shocked at what we saw. The ASP.NET process was obviously leaking handles like crazy, and they were all security tokens. We immediately dove back into the code to look for anything working with security tokens and found nothing. There were a couple of third party components, but they were written in .NET so we scanned those component’s source code with .NET Reflector, however they weren’t doing anything obvious that would use security tokens. What was even stranger was that all the security tokens were named with actual user names.
It was time to step back and look at everything about the application, from deployment on down. In examining the top level WEB.CONFIG file, we noticed the customer had set up impersonation but had used “*” instead of the usual account name. We set up a quick test with ten different user accounts on one of Wintellect’s servers and a “Hello World” ASP.NET application. Within 10 minutes we were seeing the same security token handle leak the client had.
We called our customer and explained what we were seeing, delving into why they had set the impersonation to “*”. They explained that in a future version of the product they wanted to do role-based security. We directed them to remove the impersonation for this version, and re-release the application. It ran like a dream the entire rest of the day.