Technical History of zulily

When I started at zulily in late 2010, the technology team was small but growing. As you can imagine by looking back at our growth over the past few years, zulily was not just your typical startup. In 2010 the business was starting to ramp up and we were figuring things out as we went.

Early technology and trade-offs

Before I joined, zulily had chosen a full-stack implementation of a commonly used open-source e-commerce platform and many of our processes were run out of spreadsheets or online documents. Also, the philosophy of the company regarding software development was already in place (more on this later).

As with any startup you need to make the technology choices that will enable growth, but not over-design to the point that technology becomes a bottleneck. From the beginning, the technology decisions had been made in order to facilitate growing the business very quickly. However, by the time I joined the team was starting to scale out of some of these early decisions and had to think of new ways of doing things.

Side note: my favorite story about this time period was how we found our first CDN. It was quite literally our CEO typing CDN into Google and hitting “I’m feeling lucky”. We stayed with that lucky company for two years. Another was that we hit a 50 simultaneous user limit in Google Docs and had to quickly write a system for merchants.

One of the early choices we were faced with was how to get off of a single solution for catalog management, customer service and the website. We knew this was going to be a painful process; blocking our path was a fairly critical upgrade to the software we relied on. We needed to do the upgrade because later versions of the platform had a more performant schema and we knew we would be running on some version of that schema for a while. Needless to say, when we finally confronted it things didn’t go so well. We learned a lot of lessons from the upgrade but the overall strategy was already set in motion. Get off a single platform, move extremely fast, and iterate, iterate, iterate.

Over the next few weeks (and by few I mean four, to be exact), we completely re-wrote the webstore, moved away from MYSQL as a primary store, introduced MongoDB for serving the catalog, Redis for high-performance caching, selected a new PHP front-end MVC framework, and wrote a custom image serving platform that is one of the coolest things I’ve seen in production ever. We did all this under significant pressure from the business to continue to roll new features to customers simultaneously.

Right on the heels of the upgrade was a new iPhone app, new fulfillment system, catalog management system, payments processing system and customer service system. We did all within months, and all with a team well short of 20 developers.

Courage, mistakes, and moving in “zulily time”

One of the things that allowed us to move extremely fast was an executive understanding that the only way to really promote growth was to experiment, take risks and most importantly to be comfortable with the outcomes.  Some things will succeed; some things will fail but if the bar is high for the developers we bring on, and we give them the freedom to do their jobs the results can be truly amazing. This philosophy was already in place when I joined and exists to this day.

Operationally, what this means is that we’ve broken down barriers that exist in most companies and empowered software engineers, product managers and others to get things done. At zulily engineers are required to work directly with businesses to develop an idea, write the required code, QA it, deploy it and own the results. We don’t have formal QA and the main reason why this works is that we’ve created a meaningful way to get customer feedback to developers. We have also invested in the tools and data to help developers see how things are used. In addition, like some of the newer social media companies, we introduced an element of social pressure to ensure quality.

In short: we trust engineers to do the right thing and we know that at this speed there will be mistakes.

One of my favorite examples of this was my first day (yes I said day—we call this zulily time). Our website was starting to fall over more frequently because of the amount of traffic we were handling, something that was common with our growth and spiky sales patterns. My task was to write a caching layer for the website that would give us some headroom…and deploy it that day. These were the very early days and there were minimal guardrails in place, so when it went out the new code had some issues: it crashed the site. Luckily it was easy to tell what the problem was and I was able to fix the issues quickly. I had the pig on my desk for the day, but we never slowed down. On to the next task!

This was in stark contrast to one of the places I had previously worked where we had meticulously prepared releases. Something similar happened with one of our releases there and the entire team had to sit through an hour long post-mortem meeting with managers and execs. All that did was make the developers afraid to take any risks, and displaced the sense of ownership. (Later on we did the math and the meeting was about 5x more expensive than the outage.)

This point is fully understood by everyone working at zulily: moving at this speed, people will make mistakes, and they will learn from them too. Forward progress usually outweighs most concerns. Let’s give people the tools and means to succeed and to correct things fast when mistakes are made.

Some of my favorite mistakes over the past few years:

  • Setting all the prices on the site to $1. (This was very early on; there are multiple safeguards now.)

  • A bug in some indexing code that periodically marked the entire catalog as sold-out.

  • All of the products on the site 301’d to a baby-monitor product. (This was my bad, but man did that monitor sell out fast.)

  • A fairly awesome cat picture internet meme that was accidentally published as one of our event images, when testing new image-resize code.

The last thing I will say about how we do things is that this system is not for everyone. I’ve worked with some very smart people in my tenure here, for whom this environment was not suited. You have to really be unafraid of your end-user’s feedback and be able to adjust accordingly.

Where do we go from here?

In 2012 we started to transform many of our systems from stand-alone applications to more of a service-based architecture. We introduced a few new languages (Java being the main one), and started to decouple entire systems from the ground up. While this process continues to evolve, it has yielded results from the start. We have been able to kick-start new efforts and teams in both our fulfillment and member-engagement space, and have enabled our existing teams to move in parallel much faster than anyone thought possible at this velocity.

We’re now starting to have loosely coupled systems, usually communicating via queued messages brokered by RabbitMQ, or by RESTful service calls. We using some of the newest and best open source software for search and discovery, and we have a pretty large and growing dataset which we are starting to do some innovative things with in the area of ML and data science. We’re continuing to push the boundaries of mobile-commerce, fulfillment, vendor management and merchandising—AND we’re just getting started. Come see for yourself.

Who knows where the next several years will take us. If they’re anything like the past four, watch out! As some of our friends up the street might say, it still feels very much like “Day One”.

Matt Francis
Director, Customer Experience