Sampling keys in a Redis cluster

We love Redis here at zulily. We store hundreds of millions of keys across many Redis instances, and we built our own internal distributed cache on top of Redis which powers the shopping experience for zulily customers.

One challenge when running a large, distributed cache using Redis (or many other key/value stores for that matter) is the opaque nature of the key spaces. It can be difficult to determine the overall composition of your Redis dataset, since most Redis commands operate on a single key. This is especially true when multiple codebases or teams use the same Redis instance(s), or when sharding your dataset over a large number of Redis instances.

Today, we’re open sourcing a Go package that we wrote to help with that task: reckon.

reckon enables us to periodically sample random keys from Redis instances across our fleet, aggregate statistics about the data contained in them — and then produce basic reports and metrics.

While there are some existing solutions for sampling a Redis key space, the reckon package has a few advantages:

Programmatic access to sampling results

Results from reckon are returned in data structures, not just printed to stdout or a file. This is what allows a user of reckon to sample data across a cluster of redis instances and merge the results to get an overall picture of the keyspaces. We include some example code to do just that.

Arbitrary aggregation based on key and redis data type

reckon also allows you to define arbitrary buckets based on the name of the sampled key and/or the Redis data type (hash, set, list, etc.). During sampling, reckon compiles statistics about the various redis data types, and aggregates those statistics according to the buckets you defined.

Any type that implements the Aggregator interface can instruct reckon about how to group the Redis keys that it samples. This is best illustrated with some simple examples:

To aggregate only Redis sets whose keys start with the letter a:


func setsThatStartWithA(key string, valueType reckon.ValueType) []string {
  if strings.HasPrefix(key, "a") && valueType == reckon.TypeSet {
    return []string{"setsThatStartWithA"}
  }
  return []string{}
}

To aggregate sampled keys of any Redis data type that are longer than 80 characters:


func longKeys(key string, valueType reckon.ValueType) []string {
  if len(key) > 80 {
    return []string{"long-keys"}
  }
  return []string{}
}

HTML and plain-text reports

When you’re done sampling, aggregating and/or combining the results produced by reckon you can easily produce a report of the findings in either plain-text or static HTML. An example HTML report is shown below:

reckon-random-sets

a sample report showing key/value size distributions

The report shows the number of keys sampled, along with some example keys and elements of those keys (the number of example keys/elements is configurable). Additionally, a distribution of the sizes of both the keys and elements is shown — in both standard and “power-of-two” form. The power-of-two form shows a more concise view of the distribution, using a concept borrowed from the original Redis sampler: each row shows a number p, along with the number of keys/elements that are <= p and > p/2

For instance, using the example report shown above, you can see that:

  • 68% of the keys sampled had key lengths between 8 and 16 characters
  • 89.69% of the sets sampled had between 16 and 32 elements
  • the mean number of elements in the sampled sets is 19.7

We have more features and refinements in the works for reckon, but in the meantime, check out the repo on github and let us know what you think. The codebase includes several example binaries to get you started that demonstrate the various usages of the package.

Pull requests are always welcome — and remember: Always be samplin’.

The way we Go(lang)

Here at zulily, Go is increasingly becoming the language of choice for many new projects, from tiny command-line apps to high-volume, distributed services. We love the language and the tooling, and some of us are more than happy to talk your ear off about it. Setting aside the merits and faults of the language design for a moment (over which much digital ink has already been spilled), it’s undeniable that Go provides several capabilities that make a developer’s life much easier when it comes to building and deploying software: static binaries and (extremely) fast compilation.

What makes a good build?

In general, the ideal software build should be:

  • fast
  • predictable
  • repeatable

Being fast allows developers to quickly iterate through the develop/build/test cycle, and predictable/repeatable builds allow for confidence when shipping new code to production, rolling back to a prior version or attempting to reproduce bugs.

Fast builds are provided by the Go compiler, which was designed such that:

It is possible to compile a large Go program in a few seconds on a single computer.

(There’s much more to be said on that topic in this interesting talk.)

We accomplish predictable and repeatable builds using a somewhat unconventional build tool: a Docker container.

Docker container as “build server”

Many developers use a remote build server or CI server in order to achieve predictable, repeatable builds. This makes intuitive sense, as the configuration and software on a build server can be carefully managed and controlled. Developer workstation setups become irrelevant since all builds happen on a remote machine. However, if you’ve spent any time around Docker containers, you know that a container can easily provide the same thing: a hermetically sealed, controlled environment in which to build your software, regardless of the software and configuration that exist outside the container.

By building our Go binaries using a Docker container, we reap the same benefits of a remote build server, and retain the speed and short dev/build/test cycle that makes working with Go so productive.

Our build container:

  • uses a known, pinned version of Go (v1.4.2 at the time of writing)
  • compiles binaries as true static binaries, with no cgo or dynamically-linked networking packages
  • uses vendored dependencies provided by godep
  • versions the binary with the latest git SHA in the source repo

This means that our builds stay consistent regardless of which version of Go is installed on a developer’s workstation or which Go packages happen to be on their $GOPATH! It doesn’t matter if the developer has godep or golint installed, whether they’re running an old version of Go, the latest stable version of Go or even a bleeding-edge build from source!

Git SHA as version number

godep is becoming a de facto standard for managing dependencies in Go projects, and vendoring (aka copying code into your project’s source tree) is the suggested way to produce repeatable Go builds. Godep vendors dependent code and keeps track of the git SHA for each dependency. We liked this approach, and decided to use git SHAs as versions for our binaries.

We accomplish this by “stamping” each of our binaries with the latest git SHA during the build process, using the ldflags option of the Go linker. For example:

ldflags "-X main.BuildSHA ${GIT_SHA}"

This little gem sets the value of the BuildSHA variable in the main package to be the value of the GIT_SHA environment variable (which we set to the latest git SHA in the current repo). This means that the following Go code, when built using the above technique, will print the latest git SHA in its source repo:

package main

import "fmt"

var BuildSHA string // set by the compiler at build time!

func main() {
  fmt.Printf("I'm running version: %s\n", BuildSHA)
}

Enter: boilerplate

Today, we’re open sourcing a simple project that we use for “bootstrapping” a new Go project that accomplishes all of the above. Enter: boilerplate

Boilerplate can be used to quickly set up a new Go project that includes:

  • a Docker container for performing Go builds as described above
  • a Makefile for building/testing/linting/etc. (because make is all you need)
  • a simple Dockerfile that uses the compiled binary as the container’s entrypoint
  • basic .gitignore and .dockerignore files

It even stubs out a Go source file for your binary’s main package.

You can find boilerplate on github. The project’s README includes some quick examples, as well as more details about the generated project.

Now, go forth and build! (pun intended)