Cloud World

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, 18 September 2013

Google BigQuery goes real-time with streaming inserts, time-based queries, and more

Posted on 08:00 by Unknown
Google BigQuery is designed to make it easy to analyze large amounts of data quickly. This year we've seen great updates: big scale JOINs and GROUP BYs, unlimited result sizes, smarter functions, bigger quotas, as well as multiple improvements to the web UI. Today we've gone even further, announcing several updates that give BigQuery the ability to work in real-time, query subsets of the latest data, more functions and browser tool improvements.



Real-time analytics

BigQuery is now able to load and analyze data in real time through a simple API call, the new tabledata().insertAll() endpoint. This enables you to store data as it comes in, rather than building and maintaining systems just to cache and upload in batches. The best part? The new data is available for querying instantaneously. This feature is great for time sensitive use cases like log analysis and alerts generation.



Using it is as easy as calling the new endpoint with your data in a JSON object (with a single row or multiple rows of data).



Here's a Python example:

body = {"rows":[
{"json": {"column_name":7.7,}}
]}
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=DATASET_ID,
tableId=TABLE_ID,
body=body).execute()



Streaming data into BigQuery is free for an introductory period until January 1st, 2014. After that it will be billed at a flat rate of 1 cent per 10,000 rows inserted. The traditional jobs().insert() method will continue to be free. When choosing which import method to use, check for the one that best matches your use case. Keep using the jobs().insert() endpoint for bulk and big data loading. Switch to the new tabledata().insertAll() endpoint if your use case calls for a constantly and instantly updating stream of data.



Table decorators for recent imported data

This release introduces a new syntax for "table decorators". Now you can define queries that only scan a range or spot in the previous 24 hours. Traditionally BigQuery has always done a "full column scan" when querying data, while the new syntax will allow you to focus only on a specific subset of the latest data. Querying a partial subset of data with these decorators will result in lower querying costs — proportional to the size of the subset of the queried data.



With the new capabilities you can query only the last hour of inserted data, or query only what was inserted before that hour, or get a snapshot of the table at a specific time. The table decorators also work in all table related operations (list, export, copy, and so on), giving you the ability to do operations like copying a table as it was before a bad import job.



New window and statistical functions

To the previously announced window functions, we've added SUM(), COUNT(), AVG(), MIN(), MAX(), FIRST_VALUE, and LAST_VALUE(). Refer to the previous release announcement for a run through of how window functions work.



We also recently introduced the new Pearson correlation function, and showed how to find interesting and anomalous patterns in ambient sensors, or how to predict tomorrow's flight delays using 70 million flights dataset. Today's release adds new statistical functions, for even richer capabilities: COVAR_POP(), COVAR_SAMP(), STDDEV_POP(), STDDEV_SAMP(), VAR_POP() and VAR_SAMP().



BigQuery browser tool: Improved history retrieval

You can now query your history faster in the browser tool using the Query History panel: More information about the queries is being surfaced and convenient buttons have been added for common tasks.





You can try out the new UI features, table decorators, and new functions at https://bigquery.cloud.google.com/. You can post questions and get quick answers about BigQuery usage and development on Stack Overflow. We love your feedback and comments. Join the discussion on +Google Cloud Platform using the hashtag #BigQuery.



- Posted by Felipe Hoffa, Developer Programs Engineer
Email ThisBlogThis!Share to XShare to Facebook
Posted in Announcements, bigquery | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Bridging Mobile Backend as a Service to Enterprise Systems with Google App Engine and Kinvey
    The following post was contributed by Ivan Stoyanov , VP of Engineering for Kinvey, a mobile Backend as a Service provider and Google Cloud ...
  • Tutorial: Adding a cloud backend to your application with Android Studio
    Android Studio lets you easily add a cloud backend to your application, right from your IDE. A backend allows you to implement functionality...
  • 2013 Year in review: topping 100,000 requests-per-second
    2013 was a busy year for Google Cloud Platform. Watch this space: each day, a different Googler who works on Cloud Platform will be sharing ...
  • Easy Performance Profiling with Appstats
    Since App Engine debuted 2 years ago, we’ve written extensively about best practices for writing scalable apps on App Engine. We make writ...
  • TweetDeck and Google App Engine: A Match Made in the Cloud
    I'm Reza and work in London, UK for a startup called TweetDeck . Our vision is to develop the best tools to manage and filter real time ...
  • Scaling with the Kindle Fire
    Today’s blog post comes to us from Greg Bayer of Pulse , a popular news reading application for iPhone, iPad and Android devices. Pulse has ...
  • Who's at Google I/O: Mojo Helpdesk
    This post is part of Who's at Google I/O , a series of guest blog posts written by developers who are appearing in the Developer Sandbox...
  • A Day in the Cloud, new articles on scaling, and fresh open source projects for App Engine
    The latest release of Python SDK 1.2.3, which introduced the Task Queue API and integrated support for Django 1.0, may have received a lot ...
  • SendGrid gives App Engine developers a simple way of sending transactional email
    Today’s guest post is from Adam DuVander, Developer Communications Director at SendGrid. SendGrid is a cloud-based email service that deliv...
  • Qubole helps you run Hadoop on Google Compute Engine
    This guest post comes form Praveen Seluka, Software Engineer at Qubole, a leading provider of Hadoop-as-a-service.  Qubole is a leading pr...

Categories

  • 1.1.2
  • agile
  • android
  • Announcements
  • api
  • app engine
  • appengine
  • batch
  • bicycle
  • bigquery
  • canoe
  • casestudy
  • cloud
  • Cloud Datastore
  • cloud endpoints
  • cloud sql
  • cloud storage
  • cloud-storage
  • community
  • Compute Engine
  • conferences
  • customer
  • datastore
  • delete
  • developer days
  • developer-insights
  • devfests
  • django
  • email
  • entity group
  • events
  • getting started
  • google
  • googlenew
  • gps
  • green
  • Guest Blog
  • hadoop
  • html5
  • index
  • io2010
  • IO2013
  • java
  • kaazing
  • location
  • mapreduce
  • norex
  • open source
  • partner
  • payment
  • paypal
  • pipeline
  • put
  • python
  • rental
  • research project
  • solutions
  • support
  • sustainability
  • taskqueue
  • technical
  • toolkit
  • twilio
  • video
  • websockets
  • workflows

Blog Archive

  • ▼  2013 (143)
    • ►  December (33)
    • ►  November (15)
    • ►  October (17)
    • ▼  September (13)
      • We're building a community for the cloud-minded
      • How to mix and match web services, batch processin...
      • App Engine 1.8.5 released – featuring Search API a...
      • AppScale brings choice to App Engine developers
      • How Streak makes debugging easy in App Engine Data...
      • Sendgrid now supports Google Compute Engine for se...
      • Google BigQuery goes real-time with streaming inse...
      • Check out technical solutions and sample apps to g...
      • Where the cloud meets the road
      • Running Apache Hadoop on Google Cloud Platform
      • Cloud Platform updates: App Engine 1.8.4 and new C...
      • Introducing CORR() to Google BigQuery
      • Time to show what you can do with Google Cloud Pla...
    • ►  August (4)
    • ►  July (15)
    • ►  June (12)
    • ►  May (15)
    • ►  April (4)
    • ►  March (4)
    • ►  February (9)
    • ►  January (2)
  • ►  2012 (43)
    • ►  December (2)
    • ►  November (2)
    • ►  October (8)
    • ►  September (2)
    • ►  August (3)
    • ►  July (4)
    • ►  June (2)
    • ►  May (3)
    • ►  April (4)
    • ►  March (5)
    • ►  February (3)
    • ►  January (5)
  • ►  2011 (46)
    • ►  December (3)
    • ►  November (4)
    • ►  October (4)
    • ►  September (5)
    • ►  August (3)
    • ►  July (4)
    • ►  June (3)
    • ►  May (8)
    • ►  April (2)
    • ►  March (5)
    • ►  February (3)
    • ►  January (2)
  • ►  2010 (38)
    • ►  December (2)
    • ►  October (2)
    • ►  September (1)
    • ►  August (5)
    • ►  July (5)
    • ►  June (6)
    • ►  May (3)
    • ►  April (5)
    • ►  March (5)
    • ►  February (2)
    • ►  January (2)
  • ►  2009 (47)
    • ►  December (4)
    • ►  November (3)
    • ►  October (6)
    • ►  September (5)
    • ►  August (3)
    • ►  July (3)
    • ►  June (4)
    • ►  May (3)
    • ►  April (5)
    • ►  March (3)
    • ►  February (7)
    • ►  January (1)
  • ►  2008 (46)
    • ►  December (4)
    • ►  November (3)
    • ►  October (10)
    • ►  September (5)
    • ►  August (6)
    • ►  July (4)
    • ►  June (2)
    • ►  May (5)
    • ►  April (7)
Powered by Blogger.

About Me

Unknown
View my complete profile