Being a control freak isn't such a bad thing.

If you've worked at a company with a marketing and an IT department then you've probably witnessed the classic battle for where to draw the line on what kind of control marketing has over website content. Where I work the marketing department will occasionally win one over on us (IT) and we are forced to live with the decision despite protest. I will say that most of the time these little skirmishes result in a much better product, but today I have a tale about how giving away too much control can go very very wrong.

For us the interdepartmental war largely hinges on what areas of the site marketing is allowed to control and how much control they get over these areas. Of course, marketing wants the company's entire application to be a giant Wordpress-style content management system (CMS) while those of us back in IT are forced to hold back laughter. Our product is a somewhat large data-driven application so we know that a full-blown CMS hardly makes any sense for our little 4-man team to build; we have had to learn patience when explaining things like this to those outside of our department.

Recently someone on the marketing team decided to fight what most thought was a small battle and won. What did they want? The ability to inject content into the master page of the site via a third party service. Installing this service was as easy as adding a small script to the master page in our application. One of our developers added this script and closed the support ticket. Nobody wanted to implement such a feature, but we were far too busy working on other projects to put up much resistance. That was a huge mistake on our part.

This new script sat there in our application for weeks until someone in marketing finally decided to log into the 3rd party service and create a banner to advertise an ongoing event. They proceeded to create a scrolling marquee banner and click submit. A 90's style marquee banner on our modern web application would have been a scary prospect on its own, but nobody could have guessed just how scary it would truly be.

We maintain two web applications at our company. One application is our public-facing website and the other is our internal application where employees do their work. After marketing injected this terrible-looking banner into our public-facing site all hell began to break loose. Suddenly we began to receive a waterfall of support tickets from employees using our internal application. It would seem the database was intermittently rejecting connections. Our error logs were going crazy with tens of thousands of errors every minute and the database was beginning to buckle under the load.

At first we had no idea what was going on. Our server was working just enough to allow employees to do work with intermittent errors so they suffered through it for the majority of the day while we investigated the issue. After several hours of investigation with no real progress I overheard someone mention that marketing put up a banner the night before. This was the first I had heard of marketing having such an ability so naturally I began to ask questions.

To make a long story short, marketing grabbed this marquee script from some random page on the internet and stuffed it into our site. It was running some sort of continuous loop and was throwing an error on every iteration, which was at least hundreds of times a second. So why did a client-side script bring the database to its knees you ask? A couple of months before this we had installed a client-side error monitoring script. Every time an unhandled exception was thrown this script would catch and forward it on to the server to be logged in the database just like all our server-side errors. Needless to say, hundreds of ajax requests a second was not so great on the client, the server, or the database.

We immediately released a fix disabling the client-side error logging, the 3rd party script, and the server-side action method that was listening for those ajax calls. This took the majority of load off our servers but not completely. You wouldn't believe the number of people that leave their browsers open. We continued to get error logging ajax calls for two days after this. It wasn't a huge problem now that those ajax calls were receiving 404 responses from the server but it was still keeping the server pretty busy.

In addition to the web server staying busy the database never seemed to recover completely. It was still intermittently rejecting connections. We were never able to identify a pattern with the database so at the end of the 3rd day we rebooted the database server and that seems to have fixed the issue. When everything was back to normal we had a meeting to review what had happened and how we can prevent it from ever happening again.

The entire ordeal was a perfect storm of minor things combining to cause major problems, but the obvious infraction was giving someone other than IT the ability to inject code/markup into our live web application. There were other things that should have been done as well, such as ensuring the client-side logging script had some sort of fail safe in the event of things like continuously looping errors. However, if the only script changes making it to our site were made by IT and released as part of our deployment process then it definitely would not have taken an entire day for us to hunt down the issue and put a stop to it. We can revert a deployment with the click of a single button thanks to some awesome Git deployment software we installed a while back. Since we hadn't deployed any code and the current deployment had been fine for over two weeks, we had no clue where to begin.

In a way, it was a small win for IT because now we have something to point to when they ask for similar features in the future. It's also a reminder that it's not such a bad thing to want complete control over your code. It's important to find a balance between controlling your application and accommodating your users' needs. Never give people enough rope to hang themselves with unless you know they can handle it.


Read more posts by this author.

comments powered by Disqus