Cloud World

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, 19 February 2009

Back to the Future for Data Storage

Posted on 11:07 by Unknown

Building a massive, distributed datastore which can service requests at an extremely high throughput is something that we've focused on at Google. We created something called Bigtable that underlies the datastore in App Engine. The design for Bigtable focused on scalability across a distributed system so it may operate a bit differently than databases you've worked with before, such as not supporting joins. This isn't an accident -- when you build a system that can scale to the size that Bigtable can there's no way to do a general purpose join on data sets that size and still have them be performant.

Google isn't alone in offering an non-Relational datastore to enable scaling. For example, Amazon has SimpleDB:

A traditional, clustered relational database requires a sizable upfront capital outlay, is complex to design, and often requires a DBA to maintain and administer. Amazon SimpleDB is dramatically simpler, requiring no schema, automatically indexing your data and providing a simple API for storage and access.

There are also a range of non-relational open source datastores now available such as CouchDB and Hypertable. Those are just two examples, there are many more.

While you might think this is all new, it's actually a bit of a return to the past. You see, there was a time when "RDBMS" wasn't always the answer regardless of what the question was. At the time Codd published his paper, "A Relational Model of Data for Large Shared Data Banks," there were many different approaches to datastores. It was only in the '80s that relational databases won the majority of the mindshare. Having settled on a single metaphor the industry has developed many tools and techniques to make developing on a relational database easier.

Unfortunately that majority mindshare is also a problem because while RDBMS' are useful in many situations, they are not useful in all situations. Their dominance in the mindshare means that useful alternatives aren't used, and huge amounts of time and money can be wasted trying to force non-relational problems into a relational model.

We are in the middle of a renaissance in data storage with the application of many new ideas and techniques; there's huge potential for breaking out of thinking about data storage in just one way. Michael Stonebraker pointed out in his paper, "One Size Fits All": An Idea Whose Time Has Come and Gone, that there are common datastore use cases, such as Data Warehousing and Stream Processing that are not well served by a general purpose RDBMS and that abandoning the general purpose RDBMS can give you a performance increase of one or two orders of magnitude.

It's an exciting time, and the takeaway here isn't to abandon the relational database, which is a very mature technology that works great in its domain, but instead to be willing to look outside the RDBMS box when looking for storage solutions.



Posted by Joe Gregorio, Google App Engine Team
Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • A Day in the Cloud, new articles on scaling, and fresh open source projects for App Engine
    The latest release of Python SDK 1.2.3, which introduced the Task Queue API and integrated support for Django 1.0, may have received a lot ...
  • Tutorial: Adding a cloud backend to your application with Android Studio
    Android Studio lets you easily add a cloud backend to your application, right from your IDE. A backend allows you to implement functionality...
  • Outfit 7’s Talking Friends built on Google App Engine, recently hit one billion downloads
    Today’s guest blogger is Igor Lautar, senior director of technology at Outfit7 (Ekipa2 subsidiary), one of the fastest-growing media enterta...
  • Pushing Updates with the Channel API
    If you've been watching Best Buy closely, you already know that Best Buy is constantly trying to come up with new and creative ways to...
  • DataStax Enterprise feels right at home in Google Compute Engine
    Today’s guest post comes from Martin Van Ryswyk, Vice President of Engineering at DataStax. The cloud promises many things for database user...
  • Google App Engine takes the pain out of sending iOS push notifications
    Delivering scalable, reliable mobile push notifications when hundreds of thousands of users have installed your app on their phones can be a...
  • Jump-start your data pipelining into Google BigQuery
    Once you get your data into Google BigQuery , you don’t have to worry about running out of machine capacity, because you use Google’s machin...
  • Bridging Mobile Backend as a Service to Enterprise Systems with Google App Engine and Kinvey
    The following post was contributed by Ivan Stoyanov , VP of Engineering for Kinvey, a mobile Backend as a Service provider and Google Cloud ...
  • New in Google Cloud Storage: auto-delete, regional buckets and faster uploads
    We’ve launched new features in Google Cloud Storage that make it easier to manage objects, and faster to access and upload data. With a tiny...
  • TweetDeck and Google App Engine: A Match Made in the Cloud
    I'm Reza and work in London, UK for a startup called TweetDeck . Our vision is to develop the best tools to manage and filter real time ...

Categories

  • 1.1.2
  • agile
  • android
  • Announcements
  • api
  • app engine
  • appengine
  • batch
  • bicycle
  • bigquery
  • canoe
  • casestudy
  • cloud
  • Cloud Datastore
  • cloud endpoints
  • cloud sql
  • cloud storage
  • cloud-storage
  • community
  • Compute Engine
  • conferences
  • customer
  • datastore
  • delete
  • developer days
  • developer-insights
  • devfests
  • django
  • email
  • entity group
  • events
  • getting started
  • google
  • googlenew
  • gps
  • green
  • Guest Blog
  • hadoop
  • html5
  • index
  • io2010
  • IO2013
  • java
  • kaazing
  • location
  • mapreduce
  • norex
  • open source
  • partner
  • payment
  • paypal
  • pipeline
  • put
  • python
  • rental
  • research project
  • solutions
  • support
  • sustainability
  • taskqueue
  • technical
  • toolkit
  • twilio
  • video
  • websockets
  • workflows

Blog Archive

  • ►  2013 (143)
    • ►  December (33)
    • ►  November (15)
    • ►  October (17)
    • ►  September (13)
    • ►  August (4)
    • ►  July (15)
    • ►  June (12)
    • ►  May (15)
    • ►  April (4)
    • ►  March (4)
    • ►  February (9)
    • ►  January (2)
  • ►  2012 (43)
    • ►  December (2)
    • ►  November (2)
    • ►  October (8)
    • ►  September (2)
    • ►  August (3)
    • ►  July (4)
    • ►  June (2)
    • ►  May (3)
    • ►  April (4)
    • ►  March (5)
    • ►  February (3)
    • ►  January (5)
  • ►  2011 (46)
    • ►  December (3)
    • ►  November (4)
    • ►  October (4)
    • ►  September (5)
    • ►  August (3)
    • ►  July (4)
    • ►  June (3)
    • ►  May (8)
    • ►  April (2)
    • ►  March (5)
    • ►  February (3)
    • ►  January (2)
  • ►  2010 (38)
    • ►  December (2)
    • ►  October (2)
    • ►  September (1)
    • ►  August (5)
    • ►  July (5)
    • ►  June (6)
    • ►  May (3)
    • ►  April (5)
    • ►  March (5)
    • ►  February (2)
    • ►  January (2)
  • ▼  2009 (47)
    • ►  December (4)
    • ►  November (3)
    • ►  October (6)
    • ►  September (5)
    • ►  August (3)
    • ►  July (3)
    • ►  June (4)
    • ►  May (3)
    • ►  April (5)
    • ►  March (3)
    • ▼  February (7)
      • New! Grow your app beyond the free quotas!
      • Back to the Future for Data Storage
      • Web App Wednesday, Mashup, Backup, and Decay
      • The sky's (almost) the limit! "High CPU" is no more.
      • SDK version 1.1.9 Released
      • A roadmap update!
      • Best Buy's Giftag on App Engine
    • ►  January (1)
  • ►  2008 (46)
    • ►  December (4)
    • ►  November (3)
    • ►  October (10)
    • ►  September (5)
    • ►  August (6)
    • ►  July (4)
    • ►  June (2)
    • ►  May (5)
    • ►  April (7)
Powered by Blogger.

About Me

Unknown
View my complete profile