Seattle Scalability Meetup @ zulily: Google, Hortonworks, zulily

We are looking forward to meeting everyone attending the scalability meetup at our office. It is going to be a great event with a good overview of how zulily leverages big data and a deep dive into Google Big Query & Apache Optiq in Hive.

Agenda

Topic:  Building zulily’s Data Platform using Hadoop and Google Biq Query

Speakers: Sudhir Hasbe is Director of big data, data services and BI at zulily. (https://www.linkedin.com/in/shasbe). Also Paul Newson (https://www.linkedin.com/profile/view?id=971812 )

Abstract: zulily, with 4.1 million customers and projected 2014 revenues of over 1 billion dollars, is one of the largest e-commerce companies in the U.S. “Data-driven decision making” is part of our DNA. Growth in the business has triggered exponential growth in data, which required us to redesign our data platform. The zulily data platform is the backbone for all analytics and reporting, along with being the backbone of our data service APIs consumed by various teams in the organization. This session provides a technical deep dive into our data platform and shares key learnings, including our decision to build a Hadoop cluster in the cloud.

Topic: Delivering personalization and recommendations using Hadoop in cloud

Speakers: Steve Reed is a principal engineer at zulily, the author of dropship, and former Geek of the Week. Dylan Carney is a senior software engineer at zulily. They both work on personalization, recommendations and improving your shopping experience.

Abstract: Working on personalization and recommendations at zulily, we have come to lean heavily on on-premise Hadoop clusters to get real work done. Hadoop is a robust and fascinating system, with a myriad of knobs to turn and settings to tune.  Knowing the ins and outs of obscure Hadoop properties is crucial for the health and performance of your hadoop cluster. (To wit: How big is your fsimage? Is your secondary namenode daemon running? Did you know it’s not really a secondary namenode at all?)

But what if it didn’t have to be this way? Google Compute Engine (GCE) and other cloud platforms make promises of easier, faster and easier-to-maintain Hadoop installations. Join us as we describe learning from our years of Hadoop use, and give an overview of what we’ve been able to adapt, learn and unlearn while moving to GCE.

Topic: Apache Optiq in Hive

Speaker: Julian Hyde, Principal, Hortonworks

Abstract: Tez is making Hive faster, and now cost-based optimization (CBO) is making it smarter. A new initiative in Hive introduces cost-based optimization for the first time, based on the Optiq framework. Optiq’s lead developer Julian Hyde shows the improvements that CBO is bringing to Hive. For those interested in Hive internals, he gives an overview of the Optiq framework and shows some of the improvements that are coming to future versions of Hive.

Our format is flexible: We usually have 2 speakers who talk for ~30 minutes each and then do Q+A plus discussion (about 45 minutes each talk) finish by 8:45.

There will be beer afterwards, of course!

After-beer Location:

Paddy Coyne’s:  http://www.paddycoynes.com/

Doors open 30 minutes ahead of show-time. 

This entry was posted in Big Data by Sudhir Hasbe. Bookmark the permalink.

About Sudhir Hasbe

Sudhir is an accomplished product management leader with over 16 years of experience building industry-leading products at startup and blue-chip companies. He has a proven record of leading product teams across engineering and marketing to deliver business results through customer insights and innovation.