Fixed textbust page freezes
We released a textburst update this morning that should fix a page freeze problem encountered by a small number of users over the last few weeks.
We first noticed that textburst pages would occasionally freeze a few days before Christmas and released a temporary fix that solved it for 99% of users.
Unfortunately as with most temporary fixes it wasn’t 100% effective so we’ve been investigating the root cause since getting back from the holidays.
After many, many hours digging through tens of thousands of lines of log files (it’s amazing how much logging a debug mode can create) and tweaking code here and there we at last found the cause yesterday afternoon. And two lines of code later it’s fixed.
For the technically inclined here’s a little more information:
Textburst runs on four web servers, these are set up as two load balanced pairs, one in Derby and one in Manchester. To ensure users stay logged in when switching between servers we store our session data in a MySQL database.
Our MySQL based session store is a slightly tweaked version of the MySQL session state provider which is in turn based on the Microsoft sample session state provider.
Unfortunately it turns out both of these have a bug when handling concurrent connections.
How things are supposed to work
When a page is requested it should should check to see if the session is currently locked by another page, wait until it’s free and then obtain the session data, along with a lock if the page wants write access and then increment an internal Lock ID by one which is used to make sure the same request is locking and unlocking the session.
What was actually happening
Requests that wanted read/write access to the session worked exactly as expected, however read-only sessions missed the all important step of waiting until the session was free and simply incremented the Lock ID. This meant that if a read-only request was processed in between the lock and unlock phase of a read/write request it would never be unlocked causing the next read/write request to freeze.
Luckily the ASP.NET session code is designed for this eventuality and after 110 seconds the stuck lock is forcibly removed, unfortunately our users were stuck waiting for this to happen while everything appeared to be frozen.
Our fix
We’ve updated the session provider to always wait until the session is free. To help out others with similar problems we’ve added this as a comment on the Microsoft sample and will be filing a bug report with MySQL shortly.