Cloud World

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, 26 October 2012

About today's App Engine outage

Posted on 17:47 by Unknown
This morning we failed to live up to our promise, and Google App Engine applications experienced increased latencies and time-out errors.

We know you rely on App Engine to create applications that are easy to develop and manage without having to worry about downtime. App Engine is not supposed to go down, and our engineers work diligently to ensure that it doesn’t. However, from approximately 7:30 to 11:30 AM US/Pacific, about 50% of requests to App Engine applications failed.

Here’s what happened, from what we know today:

Summary



  • 4:00 am - Load begins increasing on traffic routers in one of the App Engine datacenters.

  • 6:10 am - The load on traffic routers in the affected datacenter passes our paging threshold.

  • 6:30 am - We begin a global restart of the traffic routers to address the load in the affected datacenter.

  • 7:30 am - The global restart plus additional load unexpectedly reduces the count of healthy traffic routers below the minimum required for reliable operation. This causes overload in the remaining traffic routers, spreading to all App Engine datacenters. Applications begin consistently experiencing elevated error rates and latencies.

  • 8:28 am - google-appengine-downtime-notify@googlegroups.com is updated with notification that we are aware of the incident and working to repair it.

  • 11:10 am - We determine that App Engine’s traffic routers are trapped in a cascading failure, and that we have no option other than to perform a full restart with gradual traffic ramp-up to return to service.

  • 11:45 am - Traffic ramp-up completes, and App Engine returns to normal operation.



In response to this incident, we have increased our traffic routing capacity and adjusted our configuration to reduce the possibility of another cascading failure. Multiple projects have been in progress to allow us to further scale our traffic routers, reducing the likelihood of cascading failures in the future.  

During this incident, no application data was lost and application behavior was restored without any manual intervention by developers. There is no need to make any code or configuration changes to your applications.

We will proactively issue credits to all paid applications for ten percent of their usage for the month of October to cover any SLA violations. This will appear on applications’ November bills. There is no need to take any action to receive this credit.

We apologize for this outage, and in particular for its duration and severity. Since launching the High Replication Datastore in January 2011, App Engine has not experienced a widespread system outage.  We know that hundreds of thousands of developers rely on App Engine to provide a stable, scalable infrastructure for their applications, and we will continue to improve our systems and processes to live up to this expectation.

- Posted by Peter S. Magnusson, Engineering Director, Google App Engine
Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Tutorial: Adding a cloud backend to your application with Android Studio
    Android Studio lets you easily add a cloud backend to your application, right from your IDE. A backend allows you to implement functionality...
  • A Day in the Cloud, new articles on scaling, and fresh open source projects for App Engine
    The latest release of Python SDK 1.2.3, which introduced the Task Queue API and integrated support for Django 1.0, may have received a lot ...
  • New Admin Console Release
    Posted by Marzia Niccolai, App Engine Team Today we've released some new features in our Admin Console to make it easier for you to mana...
  • JPA/JDO Java Persistence Tips - The Year In Review
    If you’re developing a Java application on App Engine you probably already know that you can use JPA and JDO, both standard Java persistence...
  • The new Cloud Console: designed for developers
    In June, we unveiled the new Google Cloud Console , bringing together all of Google’s APIs, Services, and Infrastructure in a single interfa...
  • Best practices for App Engine: memcache and eventual vs. strong consistency
    We have published two new articles about best practices for App Engine. Are you aware of the best ways to keep Memcache and Datastore in syn...
  • Pushing Updates with the Channel API
    If you've been watching Best Buy closely, you already know that Best Buy is constantly trying to come up with new and creative ways to...
  • Outfit 7’s Talking Friends built on Google App Engine, recently hit one billion downloads
    Today’s guest blogger is Igor Lautar, senior director of technology at Outfit7 (Ekipa2 subsidiary), one of the fastest-growing media enterta...
  • Bridging Mobile Backend as a Service to Enterprise Systems with Google App Engine and Kinvey
    The following post was contributed by Ivan Stoyanov , VP of Engineering for Kinvey, a mobile Backend as a Service provider and Google Cloud ...
  • Easy Performance Profiling with Appstats
    Since App Engine debuted 2 years ago, we’ve written extensively about best practices for writing scalable apps on App Engine. We make writ...

Categories

  • 1.1.2
  • agile
  • android
  • Announcements
  • api
  • app engine
  • appengine
  • batch
  • bicycle
  • bigquery
  • canoe
  • casestudy
  • cloud
  • Cloud Datastore
  • cloud endpoints
  • cloud sql
  • cloud storage
  • cloud-storage
  • community
  • Compute Engine
  • conferences
  • customer
  • datastore
  • delete
  • developer days
  • developer-insights
  • devfests
  • django
  • email
  • entity group
  • events
  • getting started
  • google
  • googlenew
  • gps
  • green
  • Guest Blog
  • hadoop
  • html5
  • index
  • io2010
  • IO2013
  • java
  • kaazing
  • location
  • mapreduce
  • norex
  • open source
  • partner
  • payment
  • paypal
  • pipeline
  • put
  • python
  • rental
  • research project
  • solutions
  • support
  • sustainability
  • taskqueue
  • technical
  • toolkit
  • twilio
  • video
  • websockets
  • workflows

Blog Archive

  • ►  2013 (143)
    • ►  December (33)
    • ►  November (15)
    • ►  October (17)
    • ►  September (13)
    • ►  August (4)
    • ►  July (15)
    • ►  June (12)
    • ►  May (15)
    • ►  April (4)
    • ►  March (4)
    • ►  February (9)
    • ►  January (2)
  • ▼  2012 (43)
    • ►  December (2)
    • ►  November (2)
    • ▼  October (8)
      • Developer Insights: Teaching thousands of students...
      • Developer Insights: Teaching thousands of students...
      • About today's App Engine outage
      • App Engine 1.7.3 Released
      • Developer Insights: Building scalable social games...
      • Developer Insights: Streak brings CRM to the inbox...
      • Jenkins, meet Google App Engine
      • New Google BigQuery Launch includes Datastore Impo...
    • ►  September (2)
    • ►  August (3)
    • ►  July (4)
    • ►  June (2)
    • ►  May (3)
    • ►  April (4)
    • ►  March (5)
    • ►  February (3)
    • ►  January (5)
  • ►  2011 (46)
    • ►  December (3)
    • ►  November (4)
    • ►  October (4)
    • ►  September (5)
    • ►  August (3)
    • ►  July (4)
    • ►  June (3)
    • ►  May (8)
    • ►  April (2)
    • ►  March (5)
    • ►  February (3)
    • ►  January (2)
  • ►  2010 (38)
    • ►  December (2)
    • ►  October (2)
    • ►  September (1)
    • ►  August (5)
    • ►  July (5)
    • ►  June (6)
    • ►  May (3)
    • ►  April (5)
    • ►  March (5)
    • ►  February (2)
    • ►  January (2)
  • ►  2009 (47)
    • ►  December (4)
    • ►  November (3)
    • ►  October (6)
    • ►  September (5)
    • ►  August (3)
    • ►  July (3)
    • ►  June (4)
    • ►  May (3)
    • ►  April (5)
    • ►  March (3)
    • ►  February (7)
    • ►  January (1)
  • ►  2008 (46)
    • ►  December (4)
    • ►  November (3)
    • ►  October (10)
    • ►  September (5)
    • ►  August (6)
    • ►  July (4)
    • ►  June (2)
    • ►  May (5)
    • ►  April (7)
Powered by Blogger.

About Me

Unknown
View my complete profile