Cloud Outages from GoDaddy, Amazon, and Microsoft

What is a Cloud Computing? Excerpt from Wikipedia. "Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet)".

And it was marketed to provide 24x7 and 99.9999% up time. This will never be true, here's why.
  • Cloud hardware's are managed by system engineers.
  • Including the software maintenance, updates, and applying security patches are managed by system engineers. The system  engineers can be a programmers or software architect, or other related title in information technology depending where you work.
  • Documentations and Procedures. Have you read an IT procedures? They are detailed procedures, and very long. It will take some time to finish perusing a documents.
  • Managed by Human Being
And here are some recent outages by Microsoft Office365, Amazon Web Services, and GoDaddy. I only read a statement from Scott Wagner CEO, GoDaddy, but nothing from Amazon Web Services or Microsoft Office365 (outage Nov.13, 2012)

From GoDaddy website: statement related to GoDaddy outage last September 10, 2012.
"Go Daddy Site Outage Investigation Completed Yesterday, and many of our customers experienced intermittent service outages starting shortly after 10 a.m. PDT. Service was fully restored by 4 p.m. PDT.

The service outage was not caused by external influences. It was not a "hack" and it was not a denial of service attack (DDoS). We have determined the service outage was due to a series of internal network events that corrupted router data tables. Once the issues were identified, we took corrective actions to restore services for our customers and We have implemented measures to prevent this from occurring again. 

At no time was any customer data at risk or were any of our systems compromised.

Throughout our history, we have provided 99.999% uptime in our DNS infrastructure. This is the level our customers expect from us and the level we expect of ourselves. We have let our customers down and we know it.

We take our business and our customers' businesses very seriously. We apologize to our customers for these events and thank them for their patience.

 - Scott Wagner Go Daddy CEO

TechCrunch wrote this article related to GoDaddy outage.

Amazon Web Services outage last October 22, 2012. One of our sister project is using Amazon Web Services (AWS) Elastic Cloud Computing (EC2). The pingdom online monitoring recorded a 4 hours and 19 minutes of downtime. It's not pretty to look at the monitoring tool during that day.

Read the "Amazon Web Services outage once again shows reality behind "the cloud". An excerpt from "Amazon's Elastic Block Store ("EBS") service, an underpinning component of Amazon's extremely popular Elastic Compute Cloud ("EC2"), experienced a substantial service interruption this afternoon. Amazon EC2 has become such a ubiquitous feature in the cloud computing landscape that it's difficult to throw a rock without hitting a large company with a public Web offering that uses it. So today's service interruption bit deeply: among the sites knocked partially or totally offline were redditImgur, and developer favorite Heroku.

The article also point out much more serious EC2 outage last April 2011.


And today Nov. 13, 2012 Microsoft Office365 is down for more than 5 hours, some organization reported wasting a day of work. didn't blog any news related to the Office365 outage today, though I can see that post something for "Microsoft brings Internet Explorer 10 preview to Windows 7 PCs". Maybe they are not using Office365. And maybe I should start my own blog dedicated to "Cyber Down", why not! :)

The update from Microsoft Service Health, started the outage at 12:08pm to 5:44pm. Here's the details from  the service health website.

Nov 13, 2012 5:44 PM Service restored 
Closure Summary: On 11/13/2012 at approximately 5:00 PM UTC, an issue in one of Microsoft's datacenters caused users to encounter errors when accessing their email. The problem affected users in North America and Latin America. Microsoft engineers identified the root cause of the issue and initiated a failover to restore service. A complete post-incident report will be available on the Service Health Dashboard within five business days.

Nov 13, 2012 4:53 PM Restoring service The last few steps to restore the service are still in progress. We will resolve this incident as soon as we have verified availablity.

Nov 13, 2012 3:56 PM Restoring service Service restoration in the last two sites is still in progress.

Nov 13, 2012 2:59 PM Restoring service Service should be restored for most customers right now. in the impacted forests. We continue to work on restoring service for all customers as quickly as possible.

Nov 13, 2012 2:37 PM Restoring service We are still working to restore service health. We will provide additional information when it becomes available.

Nov 13, 2012 1:42 PM Restoring service We are resolving the service incident and working to restore service health. We will provide additional information when it becomes available.

Nov 13, 2012 1:18 PM Service interruption A few users are unable to access their email at this time.

Nov 13, 2012 12:55 PM Service interruption A few users are unable to access their email at this time.

Nov 13, 2012 12:08 PM Investigating We are investigating a potential issue. At this time we dont have enough information to identify whether this is an actual service incident. We will provide more information shortly.

If you post a blog related to Cloud services outage, please post it here. Or visit website to share your experience.


Post a Comment