Practical A/B Testing

Introduction

A/B testing is essential to how we operate a data-driven business at zulily. We use it to assess the impact of new features and programs before we roll them out. This blog post focuses on some of the more practical aspects of A/B testing. It is divided into four parts. It begins with an introduction to A/B testing and how we measure long-term impact. Then, it moves into the A/B splitting mechanism. Next, it turns to Decima, our in-house A/B test analysis platform. Finally, it goes behind the scenes and describes the architecture of Decima.

A/B Testing

A/B Testing Basics

In A/B testing, the classic example is changing the color of a button. Say a button is blue, but a PM comes along with a great idea: What would happen if we make it green instead? The blue button is version A, the current version, the control. The green button is version B, the new version, the test. We want to know: Is the green button as awesome as we think? Is it a better experience for our users? Does it lead to better outcomes for our business? To find out, we run an A/B test. We randomly assign some users to see version A and some to see version B. Then we measure a few key outcome metrics for the users in each group. Finally, we use statistical analysis to compare those metrics between the two groups and determine whether the results are significant.

Statistical significance is a formal way of measuring whether a result is interesting. We know that there is natural variability in our users. Not everyone behaves exactly the same way. So, we want to check if the difference between A and B could just be due to chance. Pretend we ran an A/A test instead. We randomly split the users into two groups, but everyone gets the blue button. There is a range of differences (close to zero) that we could reasonably expect to see. When the results of the A/B test are statistically significant, it means they would be highly unusual to see under an A/A test. In that case, we would conclude that the green button did make a difference.

ab_testing_basics

Figure 1. A/B testing – Split users and assign to version A or B. Measure behavior of each group. Use statistical analysis to compare.

Cumulative Outcome Metrics

To shop on zulily, users have to create an account. Requiring our users to be signed in is great for A/B testing, and for analytics in general. It means we can tie together all of a user’s actions via their account id, even if they switch browsers or devices. This makes it easy to measure long-term behaviors, well beyond a single session. And, since we can measure them, we can A/B test for them.

One of the common outcomes we measure at zulily is purchasing. A short-term outcome would be: How much did this user spend in the session when they saw the blue or green button? A long-term outcome would be: How much did the user spend during the A/B test? Whenever a user sees the control or test experience, we say they were exposed. A user can be exposed repeatedly over the course of a test. We accumulate outcome metrics from the first exposure through the end of the test. By measuring cumulative outcomes, we can better understand long-term impact and not be distracted by novelty effects.

cumulative_outcomes

Figure 2. Cumulative outcome metrics – Measure each user’s behaviors from their first exposure forward. Users can be exposed multiple times – focus on the first time. Do not count the user’s behaviors before their first exposure.

Lift

Usually, A/B test analysis measures the difference between version B and version A. For an outcome metric x, the difference between test and control is xB – xA. This difference, especially for cumulative outcomes, can increase over time. Consider the example of spend per exposed user. As the A/B test goes on, both groups keep purchasing and accumulating more spend. Version B is a success if the test group’s spend increases faster than the control’s.

Instead of difference, we measure the lift of B over A. Lift scales the difference by the baseline value. For an outcome metric x, the lift of test over control is (xB – xA) / xA * 100%. We have found that lift for cumulative metrics tends to be stable over time.

lift_spend_difference

Figure 3. Lift over time – Cumulative behaviors increases over time for both A and B, so the difference between them grows too. The lift tends to stay constant, making it a better summary of the results.

Power Analysis

Before starting an A/B test, it is good to ask two questions: What percent of users should get test versus control? and How long will the test need to run? The formal statistical way of answering these questions is a power analysis. First, we need to know what is the smallest difference (or lift) that would be meaningful to the business. This is called the effect size. Second, we need to know how much the outcome metric typically fluctuates. The power analysis calculates the sample size, the number of users needed to detect this size of effect with statistical significance.

There are two components to using the sample size. The split is the fraction of users in test versus control, and this impacts the sample size needed. The time for the test is however long it will take to expose that many users. Since users can come back and be exposed again, the cumulative number exposed will grow more slowly as time goes on. Purely mathematically, the more unbalanced the split (the further from 50-50 in either direction), the longer the test. Likewise, the smaller the effect size, the longer the test.

Size + Time – Practical Considerations

Often the power analysis doesn’t tell the whole story. For example, at zulily we have a strong weekly cycle – people shop differently on weekends from weekdays. We always recommend running A/B tests for at least one week, and ideally in multiples of seven days. Of course, if the results look dramatically negative after the first day or two, it is fine to turn off the test early.

The balance of the split affects the length of the test run, but we also consider the level of risk. If we have a big program with lots of moving parts, we might start with 90% control, 10% test. On the flip side, if we want to make sure an important feature keeps providing lift, we might maintain a holdout with 5% control, 95% test. But, if we have a low risk test, such as a small UI change, a split at 50% control, 50% test will mean shorter testing time.

A/B Split

Goals for the A/B Split

There are three key properties that any splitting strategy should have. First, the users should be randomly assigned to treatments. That way, all other characteristics of the users will be approximately the same for each of the treatment groups. The only difference going in is the treatment, so we can conclude that any differences coming out were caused by the treatment. Second, the treatments for each A/B test should be assigned independently from all other A/B tests. That way, we can run many A/B tests simultaneously and not worry about them interfering with each other’s results. Of course, it wouldn’t make sense to apply two conflicting tests to the same feature at the same time. Third, the split should be reproducible. The same user should always be assigned to the same treatment of a test. The treatment shouldn’t vary randomly from page to page or from visit to visit.

Our Strategy

At zulily, our splitting strategy is to combine the user id with the test name and apply a hash function. The result is a bucket number for that user in that test. We often set up more buckets than treatments. This provides the flexibility to start with a small test group and later increase it by moving some of the buckets from control to test.

Our splitting strategy has all three key properties. First, the hash produces pseudo-random bucketing. Second, by including the test name, the user will get independent buckets for different tests. Third, the bucket is reproducible because the hash function is deterministic.

The hash is very fast to compute, so developers don’t have to worry about the A/B split slowing down their code. To implement a test, at the decision point in the code the developer places a call to our standard test lookup function with the test name and user id. It returns the bucket number and treatment name, so the user can be directed to version A or version B. Behind the scenes, the test lookup function generates a clickstream log with the test name, user id, timestamp, and bucket. We on the Data Science team use the clickstream records to know exactly who was exposed to which test when and which treatment they were assigned.

Audience v. Exposure

There are two main ways to assign users to an A/B test: using an audience or exposure. In an audience-based test, before the test launches we create an audience – a group of users who should be in the test – and randomly split them into control and test. Then we measure all of those users’ behavior for the entire test period. This is straightforward but imprecise. Not everyone in the audience will actually be touched by the A/B test. The results are statistically valid, but it will be more difficult to detect an effect due to the extra noise.

Instead, we prefer exposure-based testing. The user is only assigned to a treatment when they reach the feature being tested. The number of exposed users increases as the test runs. The only users in the analysis are those who could have been impacted by the A/B test, so it is easier to detect a lift. In addition, we only measure the cumulative outcomes starting from each user’s first exposure. This further refines the results by excluding anything a user might have done before they had a chance to be influenced by the test.

audience_v_exposure.png

Figure 4. Audience v exposure – While both statistically valid, exposure-based tests avoid sources of noise and can detect smaller effects.

Decima UI

A Bit of Roman Mythology

The ancient Romans had a concept of the Three Fates. These were three women who control each mortal’s thread of life. First, Nona spins the thread, then Decima measures it, and finally Morta cuts it when the life is over. We named our A/B test analysis system Decima because it measures all of the live tests at zulily.

three_fates

Figure 5. Three Fates – In ancient Roman mythology the Three Fates control the thread of life. Decima’s role is to measure it.

Decima UI

The Decima UI is the face of the system to internal users. These include PMs, analysts, developers, or anyone interested in the results of an A/B test. It has two main sections: the navigation and information panel and the results panel. Figure XX shows a screenshot of Decima displaying a demo A/B test.

demo_test_decima

Figure 6. Decima UI – At zulily, Decima displays the results of A/B tests. The left panel is for navigation and information. The main panel shows the results for each outcome metric.

Navigation + Information

The navigation and information panel is on the left. A/B tests are organized by Namespace or area of the business. Within a namespace, the Experiment drop-down lists the names of all live tests. The Platform drills down to just exposures and outcomes that occurred on that platform or group of platforms (all-apps, mobile-web, etc). The Segmentation drills down to users in a particular segment (new vs existing, US vs international, etc).

The date information shows the analysis_start_date and analysis_end_date. The results are for exposures and outcomes that occurred in this date range, inclusive. The n_days shows the length of the date range. The analysis_run_date shows the timestamp when the results were computed. For live tests, the end date is always yesterday and the run date is always this morning.

Results

The main panel displays the results for each outcome metric. We analyze whether the lift is zero or statistically significantly different from zero. If a lift is significant and positive, it is colored green. If it is significant and negative, it is colored orange. If it is flat, it is left gray. The plot shows the estimated lift and its 95% confidence interval. It is easy to see whether or not the confidence interval contains zero.

The table shows the average (or proportion for a binary outcome), standard deviation, and sample size for each treatment group. Based on the statistical analysis, it shows the estimated lift, confidence interval bounds, and p-value for comparing each test group to the control.

demo_test_single_metric

Figure 7. Single metric results – Zoom in one metric in the Decima UI. The plot shows the 95% confidence interval for lift. The table shows summary numbers and statistical results.

Common Metrics

We use a variety of outcome metrics depending on the goal of the new feature being tested. Our core metrics include purchasing and visiting behaviors. Specifically, spend per exposed approximates the impact of the test to our top-line. For each exposed user, we measure the cumulative spend (possibly zero) between their first exposure date and the analysis end date. Then we average this across all users for each treatment group. Spend per exposed can be broken down into two components: chance of purchase and spend per purchaser. Sometimes a test might cause more users to purchase but spend lower amounts, or vice versa. Spend per exposed combines the two to capture the overall impact. Revisit rate measures the impact of the test to repeat engagement. For each exposed user, we count the number of days they came back after their first exposure date. We have found that visit frequency is a strong predictor of future behaviors, months down the road.

common_metrics.png

Figure 8. Common outcome metrics. Spend per exposed can be broken into chance of purchase and demand per purchaser. Revisit rate is a proxy for long-term behavior.

Decima Architecture

Three Modules of Decima

Decima is comprised of three main modules. Each is named after a famous contributor to the field that corresponds to its role. Codd invented the relational database model, so the codd module assembles the user-level dataset from our data warehouse. Gauss was an influential statistician (the Gaussian or Normal distribution is named after him), so the gauss module performs the statistical analysis. Tufte is considered a pioneer in data visualization, so the tufte module displays the results in the Decima UI. Decima runs in Google Compute Engine (GCE), with a separate Docker container for each module.

Codd

The codd module is in charge of assembling the dataset. It is written in Python. It uses recursive formatting to compose the query out of parameterized query components, filling values for the dates, test name, etc. Then it submits the query to the data warehouse in Google BigQuery and exports the resulting dataset to Google Cloud Storage (GCS).

codd

Figure 9. Codd – The codd module of Decima does data assembly.

Gauss

The gauss module takes care of the statistical analysis. It is written in R. It imports the dataset produced by codd from GCS into a data.table. It loops through the outcome metrics and performs the statistical test for lift for each one using speedglm. It also loops through platforms and segmentations to generate results for the drill downs. Finally, it gathers all the results and writes them out to a file in GCS.

gauss

Figure 10. Gauss – The gauss module of Decima does statistical analysis.

Tufte

The tufte module serves the result visualizations. It is also written in R. It imports the results file produced by gauss from GCS. It creates the tables and plots for each metric in the test using ggplot2. It displays them in an interactive UI using shiny. The UI is hosted in GCE and can be accessed by anyone at zulily.

tufte

Figure 11. Tufte – The tufte module of Decima does data visualization.

Decima Meta

The fourth module of Decima is decima-meta. It doesn’t contain any software, just queries and configuration files. The queries are broken down into reusable pieces. For example, the exposure query and outcome metrics query can be mixed and matched. Each query piece has parameters for frequently changed values, such as dates or test ids. The configuration files are written in JSON and there is one per A/B test. They specify all the query pieces and parameters for codd, as well as the outcome metrics for gauss. The idea is: running an A/B test analysis should be as easy as adding a configuration file for it.

About the Author

Julie Michelman is a Data Scientist at zulily. She designs and analyzes A/B tests, utilizing Decima, the in-house A/B test analysis tool she helped build. She also builds machine learning models that are used across the business, including marketing, merchandising, and the recommender system. Julie holds a Master’s in Statistics from the University of Washington.

Image Sources

Figure 1. https://www.freepik.com/free-icon/multiple-users-silhouette_736514.htm, https://en.wikipedia.org/wiki/Normal_distribution
Figure 5. http://bytesdaily.blogspot.com/2015/12/some-december-trivia.html
Figure 9. https://en.wikipedia.org/wiki/Edgar_F._Codd
Figure 10. https://en.wikipedia.org/wiki/Carl_Friedrich_Gauss
Figure 11. https://en.wikipedia.org/wiki/Edward_Tufte

Calculating Ad Performance Metrics in Real Time

Authors: Sergey Podlazov, Rahul Srivastava

zulily is a flash sales company.  We post a product on the site, and puff… it’s gone in 72 hours.  Online ads for those products come and go just as fast, which doesn’t leave us much time to manually evaluate the performance of the ads and take corrective actions if needed.  To optimize our ad spend, we need to know in real-time how each ad is doing, and this is exactly what we engineered.

While we track multiple metrics to measure impact of an ad, I am going to focus on one that provides a good representation of the system architecture.  This is an engineering blog after all!

The metric in question is Cost per Total Activation, or CpTA in short.  The formula for the metric is this:  divide the total cost of the ad by the number of customer activations.  We call the numerator in this formula “spend” and refer to the denominator as an “activation”.  For example, if an ad costs zulily $100 between midnight and 15:45 PST on January 31 and results in 20 activations, the CpTA for this ad as of 15:45 PST is $100/20 = $5.

Here’s how zulily collects this metric in real-time.  For the sake of simplicity, I will skip archiving processes that are sprinkled on top the architecture below.

Screen Shot 2018-01-30 at 6.22.22 PM

The source of the spend for the metric is an advertiser API, e.g. Facebook.  We’ve implemented a Spend Producer (in reference to the Producer-Consumer model) that queries the API every 15 minutes for live ads and pushes the spend into a MongoDB.  Each spend record has a tracking code that uniquely identifies the ad.

The source for the activations is a Kafka stream of purchase orders that customers place with zulily.  We consume these orders and throw them into an AWS Kinesis stream.  This gives us the ability to process and archive the orders without causing an extra strain on Kafka.  It’s important to note that relevant orders also have the ad’s tracking code, just like the spend.  That’s the link that glues spend and activations together.

The Activation Evaluator application examines each purchase and determines if the purchase is an activation.  To do that, it looks up the previous purchase in a MongoDB collection for the customer Id on the purchase order.  If the most recent transaction is non-existent or older than X days, the purchase is an activation.  The Activation Evaluator updates the customer record with the date of the new purchase.  To make sure that we don’t drop any data if the Activation Evaluator runs into issues, we don’t move the checkpoint in the Kinesis stream until the write to Mongo is confirmed.

The Activation Evaluator sends evaluated purchases into another Kinesis stream.  Chaining up Kinesis stream is a pretty common pattern for AWS applications, as it allows for the separation of concern and makes the whole system more resilient to failure of individual components.

The Activation Calculator reads the evaluated purchases from the second Kinesis stream and captures them in Mongo.  We index the data by tracking code and timestamp, and voila, a simple count() will return the number of activations for a specified period.

The last step in the process is to take the Spend and divide it by the activations.  Done.

With this architecture, zulily measures a key advertising performance metric every 15 minutes and uses it to pause poorly-performing ads.  The metric also serves as an input for various Machine Learning models, but more on those in a future blog post… Stay tuned!!

 

 

 

From Cart Pick to Put Walls

A critical part of any e-commerce company is getting product to its customers. While many of the customer experience discussions that you hear about companies focus on their website and apps or customer service and support, we often forget to think about those companies delivering their customers’ products when the company said they would. This part of the promise made (or implied) for customers is critical for building trust and providing a great end-to-end customer experience. Most large e-commerce companies operate — or pay someone else to operate — one or more “fulfillment centers”, which is where products are stored and combined with other items that need to be sent to the customer. zulily’s unique business model means we work with both big brands and boutique, smaller vendors with a variety of different capabilities, and so our products are inspected for quality, frequently bagged to keep clothing from getting dirty and often need barcoding (as many smaller vendors may not have them). The quality of zulily’s fulfillment processes drives our ability to deliver on our promises to customers and zulily’s software drives those fulfillment processes.

All fulfillment center systems start with a few basic needs: be able to receive products in from vendors, store products in a way that they can be later retrieved, and ship the product to customers. “Shipping product out,” also known as “outbound” is the most expensive operation inside the fulfillment center, so we have invested heavily in making it efficient. The problem seems simple at first glance. You gather product for customer shipment, put products in boxes, put labels on the boxes, and hand the box to UPS or USPS, etc.. The trick is making this process as efficient as possible. When zulily first started, each associate would walk the length of the warehouse picking each item and sorting it into 1 of 20 shoebox sized bins they had in their cart with each bin representing a customer shipment. Once all of the shipments had been picked, the picker delivers the completed cart to a packing station. The job of collecting products to be shipped out is known as “picking” and when our warehouse was fairly small, this strategy of one person picking the whole order worked fine. As the company has grown, our warehouses did too – some of our buildings have a million square feet of storage spread over multiple floors. Now these pickers were walking quite a long way in order for just 20 shipments. We could have just increased the size or quantity of the carts, but this is a solution that costs more as the company grows.  In addition, concerns about safety related to pulling more or larger carts and the complexities of taking one cart to multiple floors of a building make this idea impractical, to say the least.

PickCartImage

A pick cart. Each of the 20 slots on the cart represents a single customer shipment. The picker, guided by an app on a mobile device, walks the storage area until they’ve picked all of the items for the 20 shipments. We call this process “pick to shipment” because no further sorting is necessary to make sure each shipment is fully assembled.

We needed a solution that would allow pickers to spend less time walking between bins and more time picking items from those bins. We have developed a solution such that the picking software tries to keep a given picker within a zone of 10-20 storage aisles and invested in a conveyor system to carry the picked items out of the picking locations. The picker focuses on picking everything that can be picked within their zone and there’s no need for a picker to leave a zone unless they are needed in another zone. The biggest difference from the old model is that the picker is no longer assembling complete shipments. If you ordered a pair of shoes and a t-shirt from zulily, it’s unlikely that those two items would be found in the same zone due to storage considerations. Instead of an individual picker picking for 20 orders, we now have one picker picking for many orders at the same time, but staying within a certain physical area of the building. This is considerably more efficient for the pickers, but it means that we now needed a solution to assemble these zone picks into customer shipments.

PickToteImage

The picker picks for multiple shipments into a single container. Because the sorting into customer shipments happens later, this solution is called “pick to sort”.

In order to take the efficiently picked items and sort them into the right order to be sent to our customers, we have implemented a sorting solution that uses a physical solution we call a “put wall”. A put wall looks like a large shelf with no back divided into sections (called “slots”), each measuring about one foot cubed. Working at these put walls is an employee (called a “putter”) whose job is to take products from the pick totes and sort them into a slot in that put wall. Each slot in the wall is assigned to a shipment. Once all the products needed for a given shipment have been put into the slot, an indicator light on the other side of the wall lets a packer know that the shipment is ready to be placed into a box and shipped out to our customer. In larger warehouses, having just one put wall is not practical because putters would end up having to move too much distance and all the efficiency gained in packing would be lost on the putting side, so defining an appropriate size for each put wall is critical. This creates an interesting technical challenge as we have to make sure that the right products all end up in the put wall at the right time. Our picking system has to make sure that once we start picking a shipment to a wall that all the other products for that shipment also go to that wall as quickly as possible. This challenge is made more difficult by the physical capacity of the put walls. We need to limit how much is going to the wall to avoid a situation where there is no slot for a new shipment to go. We also have to make sure that each of the walls have enough work so we don’t have idle associates. When selecting shipments to be picked, we must include shipments that are due out today, but also include future work to make the operation efficient.  To do this, we have pickers rotate picking against different put walls to make sure that they get an even spread of work. A simple round-robin rotation would be naive, since throughput of the put walls is determined by humans with a wide range of different work rates. In order to solve this problem, we turn to control theory to help us select a put wall for a picker based on many of the above requirements. We also need to make sure that when the first product shows up for a shipment there is room in the wall for it.

PutterImage

As totes full of picked items are conveyed to the put wall, a putter scans each item and puts them into a slot representing a customer shipment. He is guided by both his mobile device and flashing lights on the put wall which guide him to the correct slot.

As we scaled up our operation, we initially saw that adding more pickers and put walls was not providing as much gain in throughput as we expected. In analyzing the data from the system, we determined that one of the problems was how we were selecting our put walls. Our initial implementation would select a wall for a shipment based on that wall having enough capacity and need. The problem with this approach is that we didn’t consider the makeup of each of the shipments. If you imagine a shipment that is composed of multiple products spread throughout the warehouse, you have situations where a picker has to walk through their zone N times, where N is the number of put walls we are using at any given time. As we turn on more and more put walls, that picker will have to walk through the zone that many more times. We realized that if we can create some affinity between zones and walls, we can limit the amount of put walls that a picker needs to pick and make them more efficient. We did this by assigning put walls a set of zones and try to make the vast majority of shipments for that put wall come from those zones. While we need to sometimes have larger sets that normal to cover a given shipment, we can overall significantly improve pick performance and increase the overall throughput for putters and packers.

And that’s really just the beginning of the story for a small part of our fulfillment center software suite. As the business grows, we continue to find new ways to further optimize these processes to make better use of our employees’ time and save literally millions of dollars while also increasing our total capacity using the same buildings and people! This is true of most of the software in the fulfillment space – improved algorithms are not just a fun and challenging part of the job, but also critical to the long-term success of our business.
Continue reading

zuFlow– Query Workflow and Scheduling for Google BigQuery

Authors: Matthew Kang, Shailu Mishra, Sudhir Hasbe

In 2014, we made a decision to build our core data platform on Google Cloud Platform and one of the products which was critical for the decision was Google BigQuery. The scale at which it enabled us to perform analysis we knew would be critical in long run for our business. Today we have more than 200 unique users performing analysis on a monthly basis.

Once we started using Google BiqQuery at scale we soon realized our analysts needed better tooling around it. The key requests we started getting were

  1. Ability to schedule jobs: Analysts needed to have ability to run queries at regular intervals to generate data and metrics.
  2. Define workflow of queries: Basically analysts wanted to run multiple queries in a sequence and share data across them through temp tables.
  3. Simplified data sharing: Finally it became clear teams needed to share this data generated with other systems. For example download it to leverage in R programs or send it to another system to process through Kafka.

zuFlow Overview

zuFlow is zulily’s  a query workflow and scheduling solution for Google BigQuery. There are few key concepts

  • Job: Job is a executable entity that encompasses multiple queries with a schedule.
  • Query: SQL statement that can be executed on Google BigQuery
  • Keyword: Variable defined to be used in the queries
  • Looper: Ability to define loops like foreach statements.

High Level Design

image

zuFlow is a web application that enables users to setup jobs and run them either on demand or based on a schedule.

  • We use Django with NGINX for handling our web traffic.
  • We leverage Google Cloud SQL for storing config db & keep track of runtime state
  • We have integrated the system with off the shelf open source scheduler called SOS. We do plan to migrate this to Airflow in future.
  • Flowrunner is the brain of the system written in python. It leverages data from config db and executes the queries and stores back the runtime details in the db. Few key capabilities it provides are
    • Concurrency: We have to manage our concurrency to make sure we are not overloading the system
    • Retry: In few scenarios based on error codes we retry the queries
    • Cleanup: It is responsible for cleaning up after the jobs are run including historical data cleanup

zuFlow Experience

Job Viewer: Once logged-in you can see your jobs or you can view all jobs in the system

image

Creating Job: You can provide it a name, schedule to run and email address of the owner.

image

Keywords/variables: You can create keywords which you can reuse in your query. This enables analysts to define a parameter and use it in there queries instead of hardcoding values. We also have predefined system keywords for date time handling and making it easier for users to shard tables. Examples:

  • DateTime:  CURRENT_TIMESTAMP_PT, CURRENT_DATE_PT, CURRENT_MONTH_PT, CURRENT_TIME_PT, LAST_RUN_TIMESTAMP_PT, LAST_RUN_TIMESTAMP, LAST_RUN_DATE_PT
    BQ Format Pacific date of the last run of this job (will be CURRENT_DATE_PT on first run)
  • Sharding: *_BQ
    Will provide formatted version of date strings for table shard references (without dashs – YYYYmmdd)

image

Looping: Very soon after our first release we got requests to add loops. This enables users to define variable and loop through the values.

image

Query Definition: Now you are ready to write a Google BigQuery query and define where the output will be stored. There are 4 options

  1. BQ Table: In this case you provide BQ table and decide if you want to replace it or append to it. You can also define the output table as temp table and system will clean it up after execution of job is completed.
  2. CSV: If you pick CSV you need to provide GCS location for output
  3. Cloud SQL(MySQL): You can also export to the Cloud SQL.
  4. Kafka: You can provide Kakfa topic name to publish results as messages.

You can define multiple queries and share data across them through temp tables in BQ.

image

Job Overview: This shows the full definition of the job.

image

We have thought about open sourcing the solution. Please let us know if you are interested in this system

Zome: Real-time Merchant Home Page with Spark Streaming & Kafka

Authors: Bapi Akula, Shailu Mishra, Sudhir Hasbe

zulily is a daily business, we launch our events every day at 6am PST and then most of our sales are in early hours of launching the events. It is critical for our merchants to know what is happening and react to drive more sales. We have significantly improved merchants ability to drive sales by providing them new real-time home page so everyday when they come they can take actions based on the situation.

Historical View:

Historically we had a dashboard for our merchants which was not very useful. It showed them upcoming events and some other info, but when you come every day you want to know what is happening today not tomorrow.

image

New View

We replaced this non actionable home page with a new real-time version which shows merchants real-time sales for there events, conversion rates, real-time inventory, top selling styles and projected styles which would sell out. This enables merchants to talk to vendors to get more inventory or add different products.

image

Technical Design

To build a real-time home page for merchants we had to combine real-time clickstream data (unstructured) with real-time sales (structured) and historical event and product data. Bringing these very different types of data-sets into a single pipeline and in real-time merging/processing them was a challenge.

Real-time Clickstream & Orders

We have built a high scale real-time collection service called ZIP. It peaks every day around 18k to 20k transactions per second. Our clickstream data & Order data is collected through this service. One of the capabilities of ZIP is to publish this data in real-time to Kafka cluster. This enables other systems to access data that is being collected in near-real-time.

We will describe other capabilities of this service in future post.

Historical data:

We have our data platform running on Google Cloud Platform and includes Google DataProc as our ETL Processing platform  which after processing data stores it in Google Big Query for analytics and in Google Cloud Storage. All our historical data which includes our products, events, prices and orders are stored in Google Big Query and Google Cloud storage.

image

Spark Streaming Processing

We used spark streaming to connect the clickstream and order data collected in Kafka with historical data in GCS using GCS connector provided by Google. This allowed us to create derivative datasets like real-time sales, conversion rates, top sellers which were stored in AWS Aurora DB. AWS Aurora is an amazing database for high scale queries. In future we will write up a post on why Aurora compared to other options.

Data Access through Zata

We then used our ZATA API to access this data from our Merch tools to build amazing UI for our merchants.

Spark Streaming Details

Reading the data from kafka(KafkaUtils.createDirectStream)

Kafka Utils is the object with the factory methods to create input dstreams  and RDD’s from records in topics in Apache Kafka. createDirectStream skips receivers and zookeeper and uses simple API to consume messages.This means it needs to track offsets internally.

So at the beginning of each batch, connector reads partition offsets for each topic from Kafka and uses them to ingest data. To ensure exactly once semantics, it tracks offset information in Spark Streaming checkpoints,

/code:

def getDstreamFromKafka(ssc,topic,kafka_servers):

kafkaStream = KafkaUtils.createDirectStream(ssc,[topic], {“bootstrap.servers”: kafka_servers})

parsed = kafkaStream.map(lambda v:json.dumps(v[1]).replace(‘”,’,'”;’).split(‘,’))

return parsed

 

Reading data from GCS: (TextfileStream)

This method monitors any Hadoop-compatible filesystem directory for new files and when it detects a new file – reads it into Spark Streaming. In our case, we use GCS and streaming job internally uses GCS connector. We pass GCS connector as jar file when invoking the job

/code:

ssc.textFileStream(GCS bucket path)

Merge Values: combineByKey(createCombiner,mergeValue,mergeCombiners):

In SPARK, groupByKey() doesn’t do any local aggregation while computing on the partition’s data, this is where combineByKey() comes in handy.
In combineByKey values are merged into one value at each partition, finally each value from each partition is merged into a single value.So combineByKey is a optimization to groupByKey as we end up sending fewer key value pairs across network

We used combineByKey to calculate aggregations like total sales,average price,demand.Three lambda functions were passes as arguments to this method

combineByKey(createCombiner,mergeValue,mergeCombiners)

createCombiner : The first required argument in the method is a function to be used as the very first aggregation step for each key. This function is invoked only once for every key

mergeValue : This function tells what to do when a combiner is given a new value

mergeCombiners : This Function is called to combine values of a key across multiple partitions

/code:

creCmb = (lambda v:(v[0],float(v[1]),0.0 ,0.0,0.0,v[4],v[5],v[6],v[7],v[8],1) if v[3]==-1 else (v[0],float(v[1]),float(v[1])/float(v[2]) ,float(v[3]),((float(v[1])/float(v[2]))-float(v[3])),v[4],v[5],v[6],v[7],v[8],1))

mrgVal = (lambda x, v:(max(x[0],v[0]),float(x[1])+float(v[1]),(float(x[2]))+0.0,float(x[3])+0.0,float(x[4])+0.0,min(x[5],v[4]), min(x[6],v[5]),min(x[7],v[6]),min(x[8],v[7]), max(x[9],v[8]),int(x[10])+1) if v[3]==-1 else (max(x[0],v[0]),float(x[1])+ float(v[1]),(float(x[2]))+(float(v[1])/float(v[2])),float(x[3])+float(v[3]), float(x[4])+((float(v[1])/float(v[2]))-float(v[3])),min(x[5],v[4]), min(x[6],v[5]),min(x[7],v[6]),min(x[8],v[7]),max(x[9],v[8]), int(x[10])+1))

mrgCmb = (lambda x,y :(max(x[0],y[0]),x[1]+y[1],x[2]+y[2],x[3]+y[3],x[4]+y[4], min(x[5],y[5]),min(x[6],y[6]),min(x[7],y[7]),min(x[8],y[8]),max(x[9],y[9]), int(x[10])+int(y[10])))

combineByKey(creCmb, mrgVal, mrgCmb)

stateful transformations (updateStateBykey() )

We required a framework that supported building knowledge based on both historical and real-time data. Spark Streaming provided just that.
Using stateful functions like updateStateByKey that computes running sum of all the sales we were able to achieve our requirement.
We used updateStateByKey(func) for stateful transformation,
for example : you want to keep track of number of times a customer visited the web page if customer “123” visited twice in the first hour, she visits again in the next hour
aggregated count at the end of second hour should be 3 (includes current batch count and history) so this history state will be in the memory handled by updateStateByKey
Checkpoint mechanism of spark streaming takes care of preserving the state of sales history in memory.
As an additional recovery point, we stored the state in a database
and recovered from the database in case files were cleared from checkpoint during new code deployments or configuration changes.

/code:

soi_batch_agg.updateStateByKey(updateSales)

def updateSales(newstate,oldstate):

# Incase of empty rdd

# If event Product insert timestamp is older than two days then remove from memory

try:

if (oldstate != None) and validate(oldstate[0][-4],’updateSalesFn1′) and (oldstate[0][-4]<((datetime.datetime.now(tz=pytz.utc) – datetime.timedelta(days =2)).astimezone(pytz.timezone(‘US/Pacific’)).strftime(‘%Y-%m-%d %H:%M:%S’))):

oldstate = None

# If event Product event end date is order than current timestamp then remove from memory

if (oldstate != None) and validate(oldstate[0][-3],’updateSalesFn2′) and (oldstate[0][-3] < (datetime.datetime.now(tz=pytz.utc).astimezone(pytz.timezone(‘US/Pacific’)).strftime(‘%Y-%m-%d %H:%M:%S’))):

oldstate = None

if not not newstate:

if oldstate is None:

oldstate = zome_aurora.aurora_get(str(newstate[0][-6]),str(newstate[0][-5]))

else:

print(‘Getting Records from Memory’)

if  not not oldstate:

return [(max(oldstate[0][0],newstate[0][0]),float(oldstate[0][1])+newstate[0][1],float(oldstate[0][2])+newstate[0][2],float(oldstate[0][3])+newstate[0][3],float(oldstate[0][4])+newstate[0][4],max(oldstate[0][5],newstate[0][5]),min(oldstate[0][6],newstate[0][6]),max(oldstate[0][7],newstate[0][7]),min(oldstate[0][8],newstate[0][8]),max(oldstate[0][9],newstate[0][9]),int(oldstate[0][10])+newstate[0][10])]

else:

return newstate

except Exception as e:

sys.exit(1)

Writing to Aurora :

We are not using Jdbc methods that are provided by Spark as we had some performance issues w.r.t connection creation, record insertion and commit.

We went with a approach of creating connection for each partition and do a bulk insert of all records under each partitions and insert all partitions in parallel.

/code:

def sendPartition(iter):

try:

connection=mc.connect(…) //Connect to database

cursor = connection.cursor()

data = []

for record in iter: //Loop through the records

data.append(record)

query = “INSERT INTO …. )”  //Insert into database

#cursor.execute(transaction_isolation_lock)

while Not successful …

try:

cursor.executemany(query, data)

connection.commit()

except Exception as e: //Handle Exception

finally:

cursor.close()

connection.close()

ZATA: How we used Kubernetes and Google Cloud to expose our Big Data platform as a set of RESTful web services

Authors: Shailu Mishra, Sudhir Hasbe

In our initial blog post about zulily big data platform, We briefly talked about ZATA (zulily data access service).Today we want to deep dive into ZATA and explain our thought process and how we built it.

Goals

As a data platform team we had three goals:

  1. Rich data generated by our team shouldn’t be limited to analysts. It should be available for systems & applications via simple and consistent API.
  2. Have the flexibility to change our backend data storage solutions over time without impacting our clients
  3. Zero development time for incremental data driven APIs

ZATA was our solution for achieving our above goals. We abstracted our data platform using a REST-based service layer that our clients could use to fetch datasets. We were able to swap out storage layers without any change for our client systems.

Selecting Data Storage solution

There are three different attributes you have to figure out before you pick a storage technology:

  1. Size of Data: Is it big data or relatively small data? In short, do you need something that will fit in My SQL or do you need to look at solutions like Google Big Query or AWS Aurora?
  2. Query Latency: How fast do you need to respond to Queries? Is it milliseconds or are few seconds OK – especially for large datasets
  3. Data Type: Is it relational data or is it key value pairs or is it complex JSON documents or it is a search pattern?

As an enterprise, we need all combinations of these. The following are choices our team has made over time for different attributes:

  1. Google Big Query: Great for large datasets(in terabytes) but latency is in seconds and supports columnar storage
  2. AWS Aurora: Great for large datasets (in 100s of gigabytes) with very low latency for queries
  3. PostgresXL: Great for large datasets(100s of gigs to terabytes) with amazing performance for aggregation queries. This is very difficult to manage and still early in its maturity cycle. We eventually moved our datasets to AWS Aurora.
  4. Google Cloud SQL, MySQL or SQL Server: For Small datasets(GBs) with real low latency in milliseconds)
  5. Mongo DB or Google Big Table: Good for large scale datasets with low latency document lookup.
  6. Elastic Search: We use Elastic Search for scenarios related to search both fuzzy and exact match.

Zata Architecture

clip_image001

Key runtime components for ZATA are

Mapping Layer

This looks at the incoming URLs and maps them to backend systems. For example: Request: http://xxxxx.zulily.com/dataset/product-offering?eventStartDate=[2013-11-15,2013-12-01]&outputFields=eventId,vendorId,productId,grossUnits maps to

  1. Google Big Query(based on config db mapping for product-offering )
  2. Dataset used is product-offering which is just a view in the Google Big Query system
  3. Where eventStartDate=[2013-11-15,2013-12-01] is transformed to where eventstartDate between 2013-11-15 & 2013-12-01
  4. Output fields that are requested are eventId,vendorId,productId,grossUnit
  5. Query for Google Big Query is:

Select eventId,vendorId,productId,grossUnit from product-offering  where eventStartDate=[2013-11-15,2013-12-01]

The mapping layer decides what mappings to use and how to transform the http request to something that backend will understand. This will be very different for MongoDB or Google Big Table.

Execution Layer

Execution layer is responsible for generating queries using the protocol that the storage engine will understand. It also executes the queries against backend and fetches result sets in an efficient manner. Our current implementation supports various protocols such as mongodb, standard JDBC as well as http request for Google BigQuery, Big Table and elasticsearch.

Transform Layer

This layer is responsible for transforming data coming from any of the backend sources and normalizing it. This allows our clients to be agnostic of storage mechanism in our backend systems. We went JSON as the schema format given how prevalent it is amongst services and application developers

In previous example from Mapping layer the response will be following.

[

{“eventId”: “12345”, “vendorId”: “123”, “productId”: “3456”, “grossUnits”: “10”},

{“eventId”: “23456”, “vendorId”: “123”, “productId”: “2343”, “grossUnits”: “234”},

{“eventId”: “33445”, “vendorId”: “456”, “productId”: “8990”, “grossUnits”: “23”},

{“eventId”: “45566”, “vendorId”: “456”, “productId”: “2343”, “grossUnits”: “88”}

]

API auto discovery

Our third goal was to have zero development time for incremental data driven API. We achieved this by creating an auto discovery service. The job of this service is to regularly poll the backend storage service for changes and automatically add service definitions to the config db. For example, in Google Big query or My SQL, once you add a view in schema called “zata” we automatically add the API to ZATA service. This way the data engineer can keep adding services for dataset they created without anyone writing new code.

API Schema Definition

Schema service enables users to look for all the APIs supported by zata and also view its schema to understand what requests they can send. Clients can get the list of available datasets;

Dataset Request: http://xxxxx.zulily.com/dataset

[
{ “datasetName”: “product-offering-daily”,….},
{ “datasetName”: “sales-hourly”,…………………},
{ “datasetName”: “product-offering “,………….}
]

Schema Request: Then they can drill down to the schema of a selected dataset; http://xxxxx.zulily.com/dataset/product-offering/schema/

[
{ “fieldName”: “eventId”, “fieldType”: “INTEGER” },
{ “fieldName”: “eventStartDate”, “fieldType”: “DATETIME”},
{ “fieldName”: “eventEndDate”, “fieldType”: “DATETIME” },
{ “fieldName”: “vendorId”, “fieldType”: “INTEGER” },
{ “fieldName”: “productStyle”, “fieldType”: “VARCHAR” },
{ “fieldName”: “grossUnits”, “fieldType”: “INTEGER” },
{ “fieldName”: “netUnits”, “fieldType”: “INTEGER” },
{ “fieldName”: “grossSales”, “fieldType”: “NUMERIC” },
{ “fieldName”: “netSales”, “fieldType”: “NUMERIC” }
]

So far, the client is not aware of the location or has any knowledge of the storage system and this makes the whole data story more agile. It is moved from one location to another, or the schema is altered, it will be fine for all downstream system since the access points and the contracts are managed by Zata.

Storage Service Isolation

As we rolled out ZATA over time, we realized the need for storage service isolation. Having a single service support multiple backend storage solutions with different latency requirements didn’t work very well. The slowest backend tends to slow things down for everyone else.

This forced us to rethink about zata deployment strategy. Around the same time, we were experimenting with dockers and using Kubernetes as an orchestration mechanism.

We ended up creating separate docker containers and kubernetes service for each of the backend storage solutions. So we now have a zata-bigquery service which handles all bigquery specific calls. Similary we have a zata-mongo, zata-jdbc and zata-es service. Each of these kubernetes service can be individually scaled based on anticipated load.

In addition to individual kubernetes service, we also created a zata-router service which is essentially nginx hosted in docker. Zata-router service accepts on incoming HTTP requests for zata and based on the nginx config, it routes HTTP traffic to various kubernetes services available in the cluster. The nginx config in zata-router service is dynamically refreshed by polling service to make new APIs discoverable.

clip_image003

ZATA has enabled us to make our data more accessible across the organization while enabling us to move fast and change storage layer as we scaled up.

Scaling zulily’s Infrastructure in a Pinch, with Salt

At zulily, we strive to delight our customers with the best possible experience, every day.  Our daily customer experience involves offering thousands of new products each morning, all of which comes together thanks to our technology, and impeccable coordination across the organization. As our product offerings dramatically change on a daily basis, quickly scaling our infrastructure to meet variable demand is of critical importance.  In this article, we will provide an overview of zulily’s SaltStack implementation and its role in our infrastructure management, exploring patterns and practices which enhance our automation capabilities.

 

Let’s start with a bit of context, and not the jinja kind

Our technology team embraces a DevOps approach to solving technical challenges, and many of our engineers are “full stack”. We have several product teams developing and supporting both external and internal services, with a variety of application stacks.  All product teams have developers of course, and a few have dedicated DevOps engineers.  We also have a small, dedicated infrastructure team.

 

zulily has seen phenomenal growth since it’s inception, and what was initially a tech team of one, quickly became a tech team of a few, rapidly evolving into a tech team of many product teams and engineers, which is where we find ourselves today.  With these changes and growth over time, it became apparent our infrastructure team was perhaps not the ideal team for managing all components and configurations across the entire Technology organization.

 

To elaborate further on this point, our product teams have overlapping stacks but with variations, and many teams have vastly different components comprising their stacks.  Product teams know their application stacks best, so instead of attempting to have a small team of infrastructure engineers managing all configs and components, we needed to empower product teams to be able to take ownership, by providing them with self-service options.

 

Enter SaltStack to address our organization growth, which we have found to be very approachable, with its simple-to-grasp state and pillar tree layouts, use of yaml, and customization possibilities with python.  Salt is a key component in our technology stack enabling our product teams to take control of their system configurations, keeping us moving forward quickly and accomplishing our goals.

 

saltenv == tenant (mostly), and baseless?

Like many initiatives and projects at zulily, we’ve taken a unique approach to our use of salt environments. It has worked out exceptionally well for our tech organization and we are excited to share our approach to multi-tenancy with salt.

 

Each product team has its own salt and pillar trees, salt environments map to tenants essentially. For example, we have environments with names such as “site”, we do not use salt environment names such as “dev” and “prod”.

 

But what about “real” environments?  We are able to manage those too, thanks to our strict and metadata-rich host-naming convention, paired with salt’s state and pillar tree layouts and top.sls and targeting capabilities.  Our hostnames have the following format:

 

<product_team>-<function>-<node_number>.<location_code>.<environment>.zulily.com

 

Also related to our host names, each minion has custom grains set for all of these fields, and these grains are quite useful in many of our states!

 

We have found that the majority of states are the same across (real) environments, and environment specifics can instead be managed through pillar targeting.  By keeping all of a team’s states and pillar data within just two git repositories, we have found we are overall more DRY than we would have been with separate git repositories (per real environment).

 

Additionally, salt states may be extended and overridden, which may be useful for different (real) environments when necessary.  So instead of having a flat state tree, we have sub-directories such as ‘core’, ‘dev’ and ‘prod’.  Our approach is to place just about everything under core, but use environment sub directories when we must have environment-specific states, or when we simply wish to extend or override states residing in core. If parent states in core must be modified, it is important to consider the ramifications for any environment-specific children.  We generally don’t do a lot of extending and overriding at zulily, and instead focus on placing environment specifics within targeted pillar data, as previously mentioned.

 

We have the same layout in our pillar trees for consistency, but note that pillar keys must be unique and have no hierarchy when retrieved, however, hierarchy is important for pillar top.sls targeting!

 

Reviewing the following state tree example illustrates our layout approach for a “provision environment”:

 

│── core
│   │── aliases
│   │  │── files
│   │  │   └── aliases
│   │  │── init.sls
│   │  │── map.jinja
│── dev

│── prod
│── top.sls

 

But wait, if a highstate is run, what happens and couldn’t this be dangerous?  Running a highstate does have the potential to be dangerous, if a product team accidentally targets *their* very specific MySQL states to ‘*’ for example, a separate team’s database server could result in a serious outage.  To mitigate the risk of an incident such as this occurring, pushes to all of our state and pillar repositories are subject to inspection by a git push constraint that deserializes the top.sls yaml and evaluates all targets.  The targeting allowed in our top.sls files is very restrictive, with only a subset of target types allowed, and non-relevant environment references are disallowed.  Also worth noting is that only very specific, authorized team members have write access to our salt and pillar product team repositories, a member of the site team may not write to the infrastructure team’s salt and pillar repositories.

 

Also worth mentioning, one additional layer of risk mitigation we have in place is that all of our users append “saltenv=<product_team>” to their salt-calls, always.
We do have additional environments which are not-tied to any specific project team, known as base, provision and periodic.  The base environment is empty!  The latter two are critical to our operations, we’ll explain this next.

 

Less salt (highstate runs)

In our experience at zulily, we’ve learned that the vast majority of our salt states really only need to run just once, or rather infrequently.  So our standard practice for product teams is to run highstates only once per week  or on an as-needed basis, which we do very cautiously.  It goes against the traditional wisdom of converging at least hourly, but in the end, we have had consistent environments and greater stability with this approach.  It is nearly inevitable that even the most senior automation engineer will make a bad push to master at some point, and a timed hourly run could pick up on that, with potentially disastrous consequences.  Configuration management is a powerful thing, and we have found our approach to highstating to be the appropriate balance for zulily.

 

Now, getting to zulily’s two important non-product team “environments”…

 

The first of which is known as “provision”.  States in the provision environment provide the most basic packages and configurations with reasonable defaults, which work for most product teams, most of the time.  What is very particular about the provision environment is that a “provision highstate” is only run once!  That’s correct we almost never re-run any of these states once an instance goes into production.  There just really isn’t a need, and more importantly, there may be conflicts with subsequent customizations by product teams, and we would really rather avoid unnecessary subsequent configuration breakage.

 

To limit ourselves to a single provision hightstate, our provision top.sls targeting requires that a grain be set to True, known as “in_provisioning”.  When an instance has been provisioned, we remove the grain — a provision highstate will never run again, as long as the grain remains absent.  Very seldom, we have had to roll out updates to a few individual states within provision, which we accomplish very cautiously with specific states.sls jobs.

 

We have recently open sourced a sampling of many of our basic states used at provision time, please have a look at our github project known as alkali.

 

The second non-product team “environment” is known as periodic.  While our standard is to run a full product team environment highstate once per week, some changes need to get out in near realtime.  For zulily, these types of changes are limit to states addressing resources such as posix users and groups, sudoers, iptables rules, and ssh key management.  Periodic highstates are cron’d every few minutes at present with saltenv=periodic of course.  We are however moving to triggered periodic highstates, as cron’d periodic highstate runs may block other jobs.

 

State development workflow

We have done a significant amount of state develop at zulily, and for the most part, this has occurred within Vagrant environments.  Vagrant has worked very well for us, but more recently we are beginning to leverage docker containers for this purpose.  For more information on how we are doing this, please check out a project we just released, known as buoyant.

 

Given our salt development environment, whether Vagrant or docker, we typically iterate on states working out of our home directories (synced folders or docker volumes), preferably in a branch.  Once state and pillar files are ready, we merge into master and configure very restrictive and precise targeting at first, or simply remove or disable existing targeting.  This gives us full control over our rollout process across (real) environments, which limits the risk of a service disruption, we know exactly which hosts are executing which states and when.

 

Pushes to master branches for all salt and pillar git repositories are integrated within just a few minutes with our current automation, and then ready for targeted execution across relevant minions.

 

zulily’s salt masters are controlled by a centralized infrastructure team, and product teams are restricted from running “salt” commands, they do not have access to our masters.  They do however have all the control, and only the control they need! Product teams use simple, custom scripts that leverage fabric to execute remote commands on their minions, most notably salt-call (with saltenv specified of course!).

 

Other salt-related open source projects zulily has released

Outside of the aforementioned alkali and buoyant projects, we have recently released four community formulas:

 

 

All of these projects are in their early stages, a bit heavy on the jinja in some cases, and very Ubuntu-specific for the most part at this time. They have however shown good promise for us at zulily, we didn’t want to wait any longer to share them with the community. Our hope is they will already be useful to some, and worthy of iterating on going forward.

 

Coloring outside of the lines

One of zulily’s core values is to “color outside of the lines,” and our use of SaltStack is no exception.  Many of the patterns we use are uncommon, and our approach to environments in particular may not be the first idea that comes to mind for the typical salt user.  Our use of salt and its inherent simplicity and flexiblity have enabled us to decentralize our configuration management, providing multi-tenancy and product team isolation.  With self-service capabilities in place, our product teams are empowered to move at a quick cadence, keeping pace with what we call “zulily time” around the office. We’ve had great success with SaltStack at zulily, and we are pleased to share some of our projects and patterns with the community.

 

Happy salting!

Helping Moms make purchase decisions

zulily’s unique needs

“Something special every day”  is our motto at zulily. Thousands of new products go live every morning. Most of these products are available for 72 hours and a good number of them often sell out before the sales event ends! Many of our engaged customers visit us every single day to discover what’s new. We want to make sure our customers don’t miss out on any items or events that may be special to them, while also giving them more confidence in their purchase decisions. A traditional eCommerce ratings and reviews model could help, but is not the best fit for zulily’s unique business model which offers customers a daily assortment of new products, for a limited amount of time. We needed a different approach.

Our solution

Our specific business requirements demanded a more real time and community-oriented approach. We came up with a set of signals that highlighted the social proof and scarcity. Signals like “Only X left” and “Almost Gone” were designed to encourage users to act on a product that they are interested in before it is gone. Signals like “Popular”, “X people viewing” and “Y just sold” were intended to give users more confidence in their purchase decision. We were quickly able to bring these signals to life, thanks to our real time big data platform. These signals were shown on our product pages and the shopping cart.

Product page

Shopping cart

Results

We tested the feature out on our web and m-web experiences. The results turned out to be better than our most optimistic expectations! It was interesting to note that the feature was almost twice as effective on the mobile device compared to desktop. In hindsight it made a lot of sense as our customers have a lot of small sessions on the mobile devices during the day and this information helped them make timely decisions. The social and scarcity signals turned out to be a perfect complement to zulily’s unique business model.

The Thrill-Ride Ahead for zulily Engineers

zulily is an e-commerce company that is changing the retail landscape through our ‘browse and discover’ business model.  Long term, there are tremendous international expansion opportunities as we change the way people shop across the planet. At zulily we’re building a future where the simultaneous release of 9,000 product styles across 100+ events can occur seamlessly in multiple languages across multiple platforms. From an engineering perspective, as we expand globally the size and scope of our technical challenge is nothing short of localization’s Mount Everest climb.

Navigating steep localization challenges is not new to companies expanding globally yet zulily faces a particularly unique thrill-ride ahead. For me personally, this is incredibly exciting. Prior to coming to zulily one role I held at my former company was global readiness – to ensure the products and services that the company delivered to customers were culturally, politically and geographically appropriate. I was in the unique role of mitigating the company’s risk of negative press, boycotts, protests, lawsuits or being banned by governments. The content my team reviewed was never considered life-threatening until the 2005 incident when the cultural editor of Jyllands-Posten in Denmark commissions twelve cartoonists to draw cartoons of Islamic prophet Muhammad. Those cartoons were published and their publication led to the loss of life and property. Suddenly my team took notice. Having content thoughtfully considered and ready for global markets took on a grave seriousness moving from a ‘nice-to-have’ to a ‘must-have’ risk management function. While my role was to ensure the political correctness (neutrality) of content across 500+ product groups, the group I led rarely dealt with the size and scope of technical localization challenges that zulily is facing as we expand globally.

As companies expand globally it’s important for the employees to conceptually shift their mindset from a U.S. centric perspective to a global view. This paradigm leap in how the employees of an organization consider themselves is significant. When people are increasingly thinking globally about their role and the impact of their decisions on a global audience, specifically in the area of technology and how we enable our platforms, tools and systems to be ‘world-ready’, opportunities for growth and development naturally occur while cultural content risks reduce. Today all of the content on our eight country-specific sites is in English yet we are now thinking about how to tackle bigger challenges in the future, such as supporting multiple languages. The technical implementation for international expansion has enabled our developers and product managers to gain a new appreciation for the importance of global readiness and the challenge of ‘going international’.

zulily offers over 9,000 product styles through over 100 merchandising events on a typical day. As we expand into new markets around the globe we face an extraordinary challenge from both an engineering and operations perspective. Imagine, every day at 6 a.m. PT we’re publishing the content equivalent of a daily edition of The New York Times (about one half million new words per day or over 100 million new words per year).  Another way to conceptualize the technical problem – each day we offer roughly the same number of SKUs of what you would find in a typical Costco store. Launching a new Costco store every day is difficult enough in one language yet as we scale our offerings globally the complexity to simultaneously produce this extremely high volume of content in multiple languages grows exponentially. Arguably, no e-commerce company in the world is publishing the volume of text that zulily produces on a daily basis. Further, the technical challenge is amplified because the massive volume of content must be optimized to work across multiple platforms, e.g., iPhone, iPad, Android and web based devices. In fact, over 56% of our orders are placed on mobile devices. From a user-experience across platforms and languages, text expansion and contraction become a significant issue. European languages such as French, German and Italian may require up to 30% more space than English. Double byte characters such as Chinese and Japanese will require less space.

The content on our sites is produced in-house each day by our own talented copy writers and editors. Not unlike publishing a newspaper the deadline for a 6 a.m. launch of fresh, new product styles is typically the night before. Last minute edits can happen as late as midnight! While this makes for an exciting and dynamic environment, it requires some of the brightest engineering and operational minds on the planet to bring it all together with the quality and performance we expect.

From a technical perspective our recent global expansion to Mexico, Hong Kong and Singapore faced typical localization hurdles. We needed to implement standard solutions to accommodate additional address line fields in the shipping address; ensure proper currency symbols are displayed; coding rules that allowed us to process orders without postal codes which are not required in Hong Kong.

Address Field Requirement Example:

Blog_Table

Bold = unique as compared to U.S. requirement

As we continue to move into new markets we’ll be driven to apply new solutions to traditional localization problems simply based upon the sheer volume of content and arduous daily production requirements. These two forces alone combine to drive creativity, invention and technological breakthroughs that will accelerate our growth and expansion. Our team at zulily is now exploring various strategies in engineering and operations to tackle the local/regional cultural differences across markets while also bringing zulily to our customers in their own language. We continue to hire the brightest minds to help us invent solutions for a new way of shopping. At zulily, we tell our customers ‘something fresh every day’. Our engineers enable that reality by creating something fresh every day. Our ambition to grow globally will give everyone at zulily that opportunity.

Follow us on Twitter: @zulilytech | @jmstutz

zulily’s Kubernetes launch presentation at OSCON

In July Steve Reed from zulily presented at O’Reilly’s Open Source Convention (OSCON). He spoke to zulily’s pre-launch experience with Kubernetes. It was an honor for zulily to be asked to speak as part of the Kubernetes customer showcase, given the success we have had with Kubernetes.

Kubernetes launch announcement: http://googlecloudplatform.blogspot.com/2015/07/Kubernetes-V1-Released.html