Cloud World

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, 12 December 2013

Qubole helps you run Hadoop on Google Compute Engine

Posted on 12:00 by Unknown
This guest post comes form Praveen Seluka, Software Engineer at Qubole, a leading provider of Hadoop-as-a-service. 



Qubole is a leading provider of Hadoop as a service with the mission of providing a simple, integrated, high-performance big data stack that businesses can use to derive actionable insights from their data sources quickly. The Qubole Data Service offers self-managed and auto-scaled Hadoop in the cloud along with an integrated library of data connectors and an easy-to-use GUI designed to help users focus on their data and transformations while enabling data teams to provide a superior service to the consumers of analysis. Now, Qubole is partnering with Google Compute Engine to provide a fully elastic Hadoop service to Compute Engine featuring several advantages.



Auto-scaling and self-managed Hadoop

This elasticity is particularly useful in big data workloads as they are inherently bursty e.g. a 10 node cluster may be sufficient during certain times of the day while peak workload may require a 1000 node cluster. With Qubole Data Services' auto-scaling abilities, this dynamic scaling up and scaling down of clusters becomes a reality leading to better resource utilization and hence users pay only for the resources that they truly need.



Performance and reliability

By taking advantage of Compute Engine's fast spin up of virtual machines and consistent performance, Qubole Data Service brings increased data processing throughput to Hadoop workloads. A strong and performant infrastructure further amplifies the already superior performance of Apache Hadoop provided as part of the Qubole Data Service.



Fully integrated tools for Big Data

Qubole Data Service offers an integrated set of query tools, data pipeline and workflow tools and resource monitoring and management tools to enable a large number of analytic use cases. Qubole Data Service promotes the usage of data by a larger set of users in an organization by simplifying common analytics related tasks. Qubole Data Service can take advantage of the same cloud and datacenter infrastructure that powers Google’s services to handle large and ever-increasing workloads.



We present our findings of running Qubole Data Service and Hadoop on Compute Engine vs. a leading cloud provider (CloudX). In these performance experiments, we used the popular TPC-H dataset. We generated a TPC-H 75GB dataset using the dbgen utility. The data was in delimited text format and uploaded to CloudX’s object store and Google Cloud Storage.



We created external Hive tables against these datasets and used Hadoop’s filesystem implementations to access files in the object stores. As Hive does not support the original form of TPC-H queries, we ran a modified form of TPC-H queries in sequential fashion against both clusters. The complete set of DDLs and hive queries used is available in our public bitbucket repository via the following git command:

git clone 'https://bitbucket.org/qubole/tpch.git'



In the above graph, speedup is calculated as ratio of execution time in CloudX vs Compute Engine. Therefore, a value > 1 indicates that Compute Engine was faster. On an average, Compute Engine is 1.21x faster compared to CloudX. Most queries consistently showed better performance in Compute Engine compared to CloudX.



In conclusion, Qubole brings its Qubole Data Services to Compute Engine so that users looking for big-data solutions can take advantage of Compute Engine’s high-performance, reliable and scalable infrastructure and QDS’ auto-scaling, self-managing, integrated, Hadoop as a Service offering and reduce the time and effort required to gain insights into their business.



Are you interested in running Hadoop on Google Compute Engine? Apply for our beta program.



Note: Hadoop is a trademark of the Apache Software Foundation



-Contributed by Praveen Seluka, Software Engineer, Qubole
Email ThisBlogThis!Share to XShare to Facebook
Posted in Compute Engine, partner | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Bridging Mobile Backend as a Service to Enterprise Systems with Google App Engine and Kinvey
    The following post was contributed by Ivan Stoyanov , VP of Engineering for Kinvey, a mobile Backend as a Service provider and Google Cloud ...
  • Tutorial: Adding a cloud backend to your application with Android Studio
    Android Studio lets you easily add a cloud backend to your application, right from your IDE. A backend allows you to implement functionality...
  • 2013 Year in review: topping 100,000 requests-per-second
    2013 was a busy year for Google Cloud Platform. Watch this space: each day, a different Googler who works on Cloud Platform will be sharing ...
  • Easy Performance Profiling with Appstats
    Since App Engine debuted 2 years ago, we’ve written extensively about best practices for writing scalable apps on App Engine. We make writ...
  • TweetDeck and Google App Engine: A Match Made in the Cloud
    I'm Reza and work in London, UK for a startup called TweetDeck . Our vision is to develop the best tools to manage and filter real time ...
  • Scaling with the Kindle Fire
    Today’s blog post comes to us from Greg Bayer of Pulse , a popular news reading application for iPhone, iPad and Android devices. Pulse has ...
  • Who's at Google I/O: Mojo Helpdesk
    This post is part of Who's at Google I/O , a series of guest blog posts written by developers who are appearing in the Developer Sandbox...
  • A Day in the Cloud, new articles on scaling, and fresh open source projects for App Engine
    The latest release of Python SDK 1.2.3, which introduced the Task Queue API and integrated support for Django 1.0, may have received a lot ...
  • SendGrid gives App Engine developers a simple way of sending transactional email
    Today’s guest post is from Adam DuVander, Developer Communications Director at SendGrid. SendGrid is a cloud-based email service that deliv...
  • Qubole helps you run Hadoop on Google Compute Engine
    This guest post comes form Praveen Seluka, Software Engineer at Qubole, a leading provider of Hadoop-as-a-service.  Qubole is a leading pr...

Categories

  • 1.1.2
  • agile
  • android
  • Announcements
  • api
  • app engine
  • appengine
  • batch
  • bicycle
  • bigquery
  • canoe
  • casestudy
  • cloud
  • Cloud Datastore
  • cloud endpoints
  • cloud sql
  • cloud storage
  • cloud-storage
  • community
  • Compute Engine
  • conferences
  • customer
  • datastore
  • delete
  • developer days
  • developer-insights
  • devfests
  • django
  • email
  • entity group
  • events
  • getting started
  • google
  • googlenew
  • gps
  • green
  • Guest Blog
  • hadoop
  • html5
  • index
  • io2010
  • IO2013
  • java
  • kaazing
  • location
  • mapreduce
  • norex
  • open source
  • partner
  • payment
  • paypal
  • pipeline
  • put
  • python
  • rental
  • research project
  • solutions
  • support
  • sustainability
  • taskqueue
  • technical
  • toolkit
  • twilio
  • video
  • websockets
  • workflows

Blog Archive

  • ▼  2013 (143)
    • ▼  December (33)
      • 2013 Year in review: topping 100,000 requests-per-...
      • 2013 Year in review: making Google Compute Engine ...
      • 2013 Year in review: bringing App Engine to the PH...
      • Now Get Programmatic Access to your Billing Data W...
      • 2013 year in review: making scalability easy with ...
      • 2013 Year in review: taking Google Cloud Platform ...
      • 2013 Year in review: pushing the limits of Big Data
      • 2013 Year in review: enabling native connections f...
      • 2013 Year in review: bringing Offline Disk Import ...
      • Best practices for App Engine: memcache and eventu...
      • 2013 Year in review: giving time back to developers
      • 2013 Year in review: bringing together mobile and ...
      • Go on App Engine: tools, tests, and concurrency
      • Qubole helps you run Hadoop on Google Compute Engine
      • Alert Logic security and compliance solutions for ...
      • Outfit 7’s Talking Friends built on Google App Eng...
      • You can now deliver any-screen streaming media usi...
      • Using Google Compute Engine with open source software
      • DataTorrent offers massive-scale, real-time stream...
      • DataStax Enterprise feels right at home in Google ...
      • Why We Deployed Zencoder on Google Cloud Platform
      • Scalr and Google Compute Engine
      • Cloud9 IDE on Google Compute Engine
      • Fishlabs architects upcoming game with Compute Eng...
      • An ode to Sharkon
      • SaltStack for Google Compute Engine
      • Google Compute Engine and App Engine give Evite fr...
      • SUSE Linux Enterprise Server Now Available on Goog...
      • Google Compute Engine is now Generally Available w...
      • The new Persistent Disk - faster, cheaper and more...
      • Red Hat and Google Compute Engine – Extending the ...
      • Google Compute Engine helps Mendelics diagnose gen...
      • CoolaData digs into the “why” of online consumer b...
    • ►  November (15)
    • ►  October (17)
    • ►  September (13)
    • ►  August (4)
    • ►  July (15)
    • ►  June (12)
    • ►  May (15)
    • ►  April (4)
    • ►  March (4)
    • ►  February (9)
    • ►  January (2)
  • ►  2012 (43)
    • ►  December (2)
    • ►  November (2)
    • ►  October (8)
    • ►  September (2)
    • ►  August (3)
    • ►  July (4)
    • ►  June (2)
    • ►  May (3)
    • ►  April (4)
    • ►  March (5)
    • ►  February (3)
    • ►  January (5)
  • ►  2011 (46)
    • ►  December (3)
    • ►  November (4)
    • ►  October (4)
    • ►  September (5)
    • ►  August (3)
    • ►  July (4)
    • ►  June (3)
    • ►  May (8)
    • ►  April (2)
    • ►  March (5)
    • ►  February (3)
    • ►  January (2)
  • ►  2010 (38)
    • ►  December (2)
    • ►  October (2)
    • ►  September (1)
    • ►  August (5)
    • ►  July (5)
    • ►  June (6)
    • ►  May (3)
    • ►  April (5)
    • ►  March (5)
    • ►  February (2)
    • ►  January (2)
  • ►  2009 (47)
    • ►  December (4)
    • ►  November (3)
    • ►  October (6)
    • ►  September (5)
    • ►  August (3)
    • ►  July (3)
    • ►  June (4)
    • ►  May (3)
    • ►  April (5)
    • ►  March (3)
    • ►  February (7)
    • ►  January (1)
  • ►  2008 (46)
    • ►  December (4)
    • ►  November (3)
    • ►  October (10)
    • ►  September (5)
    • ►  August (6)
    • ►  July (4)
    • ►  June (2)
    • ►  May (5)
    • ►  April (7)
Powered by Blogger.

About Me

Unknown
View my complete profile