Once you get your data into Google BigQuery, you don’t have to worry about running out of machine capacity, because you use Google’s machines as if they were your own. But what if you want to transform your source data before putting it into BigQuery and you don’t have the server capacity to handle the transformation? In this case, how about using Google Compute Engine to run your Extract, Transform and Load (ETL) processing? To learn how, read our paper Getting Started With Google BigQuery. To get started, download the sample ETL tool for Google Compute Engine from GitHub.
The sample ETL tool is an application that automates the steps of getting the Google Compute Engine instance up and running, and installing the software you need to rapidly design, create and execute the ETL workflow. The application includes a sample workflow that uses KNIME to help you understand the entire process, as shown here:
If you already have an established process for performing the ETL process to prepare the data and load it into Google Cloud Storage, but need a reliable way to load the data from there into BigQuery, we have a another sample application to help you. The Automated File Loader for BigQuery sample app demonstrates how to automate data loading from Google Cloud Storage to BigQuery.
This application uses the Cloud Storage Object Change Notification API to receive notifications that files have been uploaded to a bucket in Google Cloud Storage, then uses the BigQuery API to load the data from the bucket into BigQuery. Download it now from GitHub.
Both these sample applications accompany the article Getting Started With Google BigQuery, which provides an overview of the end-to-end process from loading data into BigQuery to visualization, and design practices that should be considered when using BigQuery.
-Posted by Wally Yau, Solutions Architect
Tuesday, 1 October 2013
Jump-start your data pipelining into Google BigQuery
Posted on 10:16 by Unknown
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment