Cloud World

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, 31 October 2012

Developer Insights: Teaching thousands of students to program on Udacity with App Engine (part 2)

Posted on 09:59 by Unknown


This post is the second of our two-part series discussing how Udacity uses Google App Engine.

Today’s guest blogger is Chris Chew, senior software engineer at Udacity, which offers free online courses in programming and other subjects.  Chris shares how Udacity itself is built using App Engine.

Steve Huffman blogged yesterday about how App Engine enables the project-based learning that makes his web development course so powerful.  People are often surprised to learn that Udacity itself is built on App Engine.

The choice to use App Engine originally came from Mike Sokolsky, our CTO and cofounder, after his experience keeping the original version of our extremely popular AI course running on a series of virtual machines.  Mike found App Engine’s operational simplicity extremely compelling after weeks of endlessly spinning up additional servers and administering MySQL replication in order to meet the crazy scale patterns we experience.

Close to a year later, with ten months of live traffic on App Engine, we continue to be satisfied customers.  While there are a few things we do outside App Engine, our choice to continue using App Engine for our core application is clear:  We prefer to spend our time figuring out how to scale personalized education, not memcached.  App Engine’s infrastructure is better than what we could build ourselves, and it frees us to focus on behavior rather than operations.

How Udacity Uses App Engine

The App Engine features we use most include a pretty broad swath of the platform:



  • High Replication Datastore with NDB

  • Memcache

  • Task Queues - Deferred execution, MapReduce, batch jobs

  • App Engine Search API -- Indexing both course content and student résumés

  • Blobstore API -- Lecture videos, résumés, data exportation

  • Image API - Thumbnail generation

  • MapReduce API - Daily usage analytics, data migrations, data maintenance



A high-level representation of our “stack” looks something like this:












Trails and Trove are two libraries developed in-house mainly by Piotr Kaminski.  Trails supplies very clean semantics for creating families of RESTful endpoints on top of a webapp2.RequestHandler with automagic marshalling.  Trove is a wrapper around NDB that adds common property types (e.g. efficient dereferencing of key properties), yet another layer of caching for entities with relations (both in-process and memcache), and an event “watcher” framework for reliably triggering out-of-band processing when data changes.

Something notable that is not represented in the drawing above is a specific set of monkey patches from Trove we apply to NDB to create better hooks similar to the existing pre/post-put/delete hooks.  These custom hooks power a “watcher” abstraction that provides targeted pieces of code the opportunity to react to changes in the data layer.  Execution of each watcher is deferred and runs outside the scope of the request so as to not increase response times.

Latency

During our first year of scaling on App Engine we learned its performance is a complex thing to understand.  Response time is a function of several factors both inside and outside our control.  App Engine’s ability to “scale-out” is undeniable, but we have observed high variance in response times for a given request, even during periods with low load on the system.  As a consequence we have learned to do a number of things to minimize the impact of latency variance:





  • Converting usage of the old datastore API to the new NDB API

  • Using NDB.tasklet coroutines as much as possible to enable parallelism during blocking RPC operations

  • Not indexing fields by default and adding an index only when we need it for a query

  • Carefully avoiding index hotspots by indexing fields with predictable values only when necessary (i.e. auto-now DateTime and enumerated “choices” String properties).

  • Materializing data views very aggressively so we can limit each request to the fewest datastore queries possible



This last point is obvious in the sense that naturally you get faster responses when you do less work.  But we have taken pre-materializing views to an extreme level by denormalizing several aspects of our domain into read-optimized records.  For example, the read-optimized version of a user’s profile record might contain standard profile information, plus privacy configuration, course enrollment information, course progress, and permissions -- all things a data modeler would normally want to store separately.  We pile it together into the equivalent of a materialized view so we can fetch it all in one query.


Conclusion

App Engine is an amazingly complete and reliable platform that works astonishingly well for a huge number of use cases.  It is very apparent the services and APIs have been designed by people who know how to scale web applications, and we feel lucky to have the opportunity to ride on the shoulders of such giants.  It is trivial to whip together a proof-of-concept for almost any idea, and the subsequent work to scale your app is significantly less than if you had rolled your own infrastructure.

As with any platform, there are tradeoffs.  The tradeoff with App Engine is that you get an amazing suite of scale-ready services at the cost of relentlessly optimizing to minimize latency spikes.  This is an easy tradeoff for us because App Engine has served us well through several exciting usage spikes and there is no question the progress we have already made towards our mission is significantly more than if we were also building our own infrastructure.  Like most choices in life, this choice can be boiled down to a bumper sticker:










Editor’s note: Chris Chew and Steve Huffman will be participating in a Google Developers Live Hangout tomorrow, Thursday, November 1st, check it out here and submit your questions for them to answer live on air.

-Contributed by Chris Chew, Senior Software Engineer, Udacity



Posted by Zafir Khan, Product Marketing Manager, Google App Engine
Email ThisBlogThis!Share to XShare to Facebook
Posted in developer-insights | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Bridging Mobile Backend as a Service to Enterprise Systems with Google App Engine and Kinvey
    The following post was contributed by Ivan Stoyanov , VP of Engineering for Kinvey, a mobile Backend as a Service provider and Google Cloud ...
  • Tutorial: Adding a cloud backend to your application with Android Studio
    Android Studio lets you easily add a cloud backend to your application, right from your IDE. A backend allows you to implement functionality...
  • 2013 Year in review: topping 100,000 requests-per-second
    2013 was a busy year for Google Cloud Platform. Watch this space: each day, a different Googler who works on Cloud Platform will be sharing ...
  • Easy Performance Profiling with Appstats
    Since App Engine debuted 2 years ago, we’ve written extensively about best practices for writing scalable apps on App Engine. We make writ...
  • TweetDeck and Google App Engine: A Match Made in the Cloud
    I'm Reza and work in London, UK for a startup called TweetDeck . Our vision is to develop the best tools to manage and filter real time ...
  • Scaling with the Kindle Fire
    Today’s blog post comes to us from Greg Bayer of Pulse , a popular news reading application for iPhone, iPad and Android devices. Pulse has ...
  • Who's at Google I/O: Mojo Helpdesk
    This post is part of Who's at Google I/O , a series of guest blog posts written by developers who are appearing in the Developer Sandbox...
  • A Day in the Cloud, new articles on scaling, and fresh open source projects for App Engine
    The latest release of Python SDK 1.2.3, which introduced the Task Queue API and integrated support for Django 1.0, may have received a lot ...
  • SendGrid gives App Engine developers a simple way of sending transactional email
    Today’s guest post is from Adam DuVander, Developer Communications Director at SendGrid. SendGrid is a cloud-based email service that deliv...
  • Qubole helps you run Hadoop on Google Compute Engine
    This guest post comes form Praveen Seluka, Software Engineer at Qubole, a leading provider of Hadoop-as-a-service.  Qubole is a leading pr...

Categories

  • 1.1.2
  • agile
  • android
  • Announcements
  • api
  • app engine
  • appengine
  • batch
  • bicycle
  • bigquery
  • canoe
  • casestudy
  • cloud
  • Cloud Datastore
  • cloud endpoints
  • cloud sql
  • cloud storage
  • cloud-storage
  • community
  • Compute Engine
  • conferences
  • customer
  • datastore
  • delete
  • developer days
  • developer-insights
  • devfests
  • django
  • email
  • entity group
  • events
  • getting started
  • google
  • googlenew
  • gps
  • green
  • Guest Blog
  • hadoop
  • html5
  • index
  • io2010
  • IO2013
  • java
  • kaazing
  • location
  • mapreduce
  • norex
  • open source
  • partner
  • payment
  • paypal
  • pipeline
  • put
  • python
  • rental
  • research project
  • solutions
  • support
  • sustainability
  • taskqueue
  • technical
  • toolkit
  • twilio
  • video
  • websockets
  • workflows

Blog Archive

  • ►  2013 (143)
    • ►  December (33)
    • ►  November (15)
    • ►  October (17)
    • ►  September (13)
    • ►  August (4)
    • ►  July (15)
    • ►  June (12)
    • ►  May (15)
    • ►  April (4)
    • ►  March (4)
    • ►  February (9)
    • ►  January (2)
  • ▼  2012 (43)
    • ►  December (2)
    • ►  November (2)
    • ▼  October (8)
      • Developer Insights: Teaching thousands of students...
      • Developer Insights: Teaching thousands of students...
      • About today's App Engine outage
      • App Engine 1.7.3 Released
      • Developer Insights: Building scalable social games...
      • Developer Insights: Streak brings CRM to the inbox...
      • Jenkins, meet Google App Engine
      • New Google BigQuery Launch includes Datastore Impo...
    • ►  September (2)
    • ►  August (3)
    • ►  July (4)
    • ►  June (2)
    • ►  May (3)
    • ►  April (4)
    • ►  March (5)
    • ►  February (3)
    • ►  January (5)
  • ►  2011 (46)
    • ►  December (3)
    • ►  November (4)
    • ►  October (4)
    • ►  September (5)
    • ►  August (3)
    • ►  July (4)
    • ►  June (3)
    • ►  May (8)
    • ►  April (2)
    • ►  March (5)
    • ►  February (3)
    • ►  January (2)
  • ►  2010 (38)
    • ►  December (2)
    • ►  October (2)
    • ►  September (1)
    • ►  August (5)
    • ►  July (5)
    • ►  June (6)
    • ►  May (3)
    • ►  April (5)
    • ►  March (5)
    • ►  February (2)
    • ►  January (2)
  • ►  2009 (47)
    • ►  December (4)
    • ►  November (3)
    • ►  October (6)
    • ►  September (5)
    • ►  August (3)
    • ►  July (3)
    • ►  June (4)
    • ►  May (3)
    • ►  April (5)
    • ►  March (3)
    • ►  February (7)
    • ►  January (1)
  • ►  2008 (46)
    • ►  December (4)
    • ►  November (3)
    • ►  October (10)
    • ►  September (5)
    • ►  August (6)
    • ►  July (4)
    • ►  June (2)
    • ►  May (5)
    • ►  April (7)
Powered by Blogger.

About Me

Unknown
View my complete profile