Facebook has open sourced Presto, the interactive SQL-on-Hadoop engine the company first discussed in June. Presto is Facebook’s take on Cloudera’s Impala or Google’s Dremel, and it already has some big-name fans in Dropbox and Airbnb.
Technologically, Presto and other query engines of its ilk can be viewed as faster versions of Hive, the data warehouse framework for Hadoop that Facebook created several years ago. Facebook and many other Hadoop users still rely heavily on Hive for batch-processing jobs such as regular reporting, but there has been a demand for something letting users perform ad hoc, exploratory queries on Hadoop data similar to how they might do them using a massively parallel relational database.
Presto is 10 times faster than Hive for most queries, according to Facebook software engineer Martin Traverso in a blog post detailing today’s news.
Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse. Over 1,000 Facebook employees use Presto daily to run more than 30,000 queries that in total scan over a petabyte each per day.
Leading internet companies including Airbnb and Dropbox are using Presto.
Presto is amazing. Lead engineer Andy Kramolisch got it into production in just a few days. It’s an order of magnitude faster than Hive in most our use cases. It reads directly from HDFS, so unlike Redshift, there isn’t a lot of ETL before you can use it. It just works.
Christopher Gutierrez, Manager of Online Analytics, Airbnb
We’re really excited about Presto. We’re planning on using it to quickly gain insight about the different ways our users use Dropbox, as well as diagnosing problems they encounter along the way. In our tests so far it’s been rock solid and extremely fast when applied to some of our most important ad hoc use cases.
Fred Wulff, Software Engineer, Dropbox
It will be interesting to watch how, if at all, Presto affects adoption of Cloudera’s Impala, Hortonworks’ Stinger project, Pivotal’s HAWQ or any other of the myriad SQL-on-Hadoop engines currently making fighting for mindshare. The fact that Presto is open source and ready to use certainly has to be a big draw for some users, and could help it establish a solid user base while other technologies are still coming to be.