Chip Anderson

More Power Mr. Scott! What Happened This Morning and What We Are Doing About It

Ever have a power outage that wasn't a power outage?   Me neither, at least not until this morning at 8:16am (5:16am Pacific time).  At that moment, for reasons that have not been explained, there was a local power outage here in the north side of Redmond that only affected "large" equipment like chillers, air conditioning units and server rooms.   Basically, the stuff that we use here to keep StockCharts.com working (of course!).

As we learned later, the outage only affected one of the three "phases" of power going to all of the buildings in our area.  What that meant was that all of the regular things that normal humans use like lights and laptop computers and coffee machines, etc. still worked fine - because they only use one or two of the "power phases" in the building.  This situation led to a TON of confusion.


Initially, we thought it was just a problem with our chiller - a big A/C unit on the side of the building that we use to keep our server room cool.  Because the lights were still on, we thought the problem with the power was fixed.  It wasn't.

While waiting for the A/C maintenance people to arrive, we were trying to keep the server room cool with fans and making sure that the website would work correctly when the market opened.  Aside from the heat, things were looking "doable" until...

When the power in the server room cut off completely, my first thought was "Am I dead?  This is never supposed to happen.  We have lots of systems in place to make sure this never happens!"  Then it hit me.  Unnoticed by everyone, we had been running on backup power for several hours and that had finally exhausted the backup system's capacity.

Now, some of you might rightly be asking "How did you not notice you were on backup power?"   Again, you need to understand that due to the bizarre nature of the outage (only losing one of the 3 power phases) things seemed almost completely normal to us - lights were on, computer monitors worked, the servers were working, the fans were working - literally everything was working except the Chiller.  There was no visible reason to suspect that we were running on backup power until it just died.

At that point, we were offline.  Off-off-off-off-offline.   We were as offline as we could possibly get until the third power phase came back.  After some frantic calls to the local power company, we finally learned what had happened and everything fell into place in terms of how we'd gotten ourselves into the mess we were in.  Soon after that, I was able to get a cell signal again and started sending out Tweets to try and let people know what had happened.  I know that not everyone follows our Twitter feed, but that was the only communication method available to us at that point.  I strongly encourage everyone to follow us on Twitter for this very reason.

Power was fully restored about an hour later.  It took us another hour or so to get all of our internal equipment restarted.  Unfortunately the outage also affected our datafeed machines and it took them longer to recover - thus the missing intraday data on today's charts.

Some of the lessons we learned today:

  • It is possible to lose some-but-not-all power to a building.
  • We need better backup cooling equipment in case the chiller fails.
  • We need more visible alarms for when the system switches over to the backup power.
  • We need a better way to communicate to users in these circumstances.
  • The majority of our users are awesome people who understand that sometimes these things happen.

It is also a good time for me to clearly state that we cannot and do not guarantee our service will be up 24/7.  We try REALLY HARD to keep things up continually, and we generally do a better job than many other websites that are a lot bigger than us, however, that said, we can't guarantee that this won't happen again sometime.  To put this in perspective, this is the second time in 16 years that we've had a power problem (the last one was in 2003).  Again, that is 2 times too many, but it is not "common" or "frequent" by any stretch.

Experienced online traders know that they have to have backup solutions for their trading platforms.  It is the nature of the beast.  For those of you that were inconvenienced by this outage, I offer my sincerest apologies and my pledge that we will continue to learn from and improve from this event.  For those of you that were severely impacted by this outage, I want to strongly encourage you to have a backup charting platform configured and available.  Many people use their online brokerage firm's website as a backup.  Make sure you know how to use that platform in an emergency.

As I mentioned on the website, if you are a member of StockCharts.com, we've extended your subscription by one additional day because of this morning's outage.

Thanks again for your understanding and support,
Chip Anderson
President, StockCharts.com, Inc.

 

|

Subscribe to Chip Anderson to be notified whenever a new post is added to this blog!
No worries. You folks have always done your best. Keep up the good work.
Thanks for the comprehensive, and even entertaining explanation. No complaints with a consistent 99.8% uptime, good nuff!
comments powered by Disqus