About this guide

This guide covers:

  • What are secondary indexes (2i)
  • Riak's approach to secondary indexes
  • Indexing stored values with Welle
  • Secondary index queries

This work is licensed under a Creative Commons Attribution 3.0 Unported License (including images & stylesheets). The source is available on Github.

What version of Welle does this guide cover?

This guide covers Welle 3.0, including development releases.

What version of Riak does this guide cover?

Secondary indexes in Riak are supported starting with Riak 1.0 and only by buckets that use LevelDB storage backend.

Introduction

According to Wikipedia, a database index is a data structure that improves the speed of data retrieval operations. Secondary indexes are indexes on keys other then the primary key (the one you specify when storing a value in Riak). For example, if an application has accounts that look like this:

{:first-name "John"
 :last-name "Doe"
 :username "johndoe"
 :email "jdoe@example.com"
 :password-hash "…"
 :password-salt "…"}

then it may be desirable to be able to find an account with the given email so that it can be authenticated. One approach would be to use email as primary key but it is not always possible or practical. Secondary indexes is what makes it possible to efficiently retrieve an account with a particular email.

Secondary Indexes in Riak

Many databases (for example, PostgreSQL or Oracle) require indexes to be created before you can execute queries over them. In addition, they (typically) create index entries for every inserted row (this behavior can be made conditional in PostgreSQL). This works well when schema is fixed and known upfront but this is not the case with Riak. Riak treats stored values as "bags of bytes" and they can have any shape.

Because of that, secondary indexes work differently with Riak. With Riak, index entries (field/value pairs) are specified when a value is stored. Then, every time the value is updated, index entries may change and reindexing will happen. In other words, secondary indexes are flexible, just like the schema.

Riak supports two kinds of 2i queries: value (also known as "equality") and range. The email example falls into the former category, while timestamp-based queries like "all events between 3 days ago and today" fall into the latter.

Indexing Stored Values

To specify index entries on a stored value, use the :indexes option of clojurewerkz.welle.kv/store function:

(ns welle.docs.examples
  (:require [clojurewerkz.welle.core    :as wc]
            [clojurewerkz.welle.buckets :as wb]
            [clojurewerkz.welle.kv      :as kv])
  (:import com.basho.riak.client.http.util.Constants))

(let [conn   (wc/connect)
      bucket "accounts"
      key    "novemberain"
      email  "michael@example.com"
      val    {:name "Michael" :age 27 :username key :email email}]
  (wb/create conn bucket)
  (kv/store  conn bucket key val {:content-type Constants/CTYPE_JSON_UTF8 :indexes {:email #{email}}}))

:indexes value must be a map where keys are names of indexed fields (that can be keywords or strings) and values are sets of values. It is common to have just one value in a set like in the example above, but there also may be more than one:

(ns welle.docs.examples
  (:require [clojurewerkz.welle.core    :as wc]
            [clojurewerkz.welle.buckets :as wb]
            [clojurewerkz.welle.kv      :as kv]))

(let [conn      (wc/connect)
      bucket    "accounts"
      key       "novemberain"
      languages #{"clojure" "java" "ruby" "scala" "erlang"}
      val       {:name "Michael" :age 27 :username key :created-at (java.util.Date.) :hacks languages}]
  (wb/create conn bucket)
  (kv/store  conn bucket key val {:content-type "application/clojure" :indexes {:language languages}}))

If you need to index date/time values, use timestamps (longs):

(kv/store conn bucket key val {:content-type Constants/CTYPE_JSON_UTF8 :indexes {:email #{email} :created-at #{(to-timestamp (now))}}})

For anything that involves working with date, time or timestamp values, it is highly recommended that you use clj-time and its convenient coercion functions

Secondary Index Queries

As it was mentioned before, there are two types of 2i queries in Riak: value (equality) and range queries. Both are performed using the same function, clojurewerkz.welle.kv/index-query that takes a bucket name, a field to query against and a field value. Here is a value query example:

(ns welle.docs.examples
  (:require [clojurewerkz.welle.core    :as wc]
            [clojurewerkz.welle.buckets :as wb]
            [clojurewerkz.welle.kv      :as kv])
  (:import com.basho.riak.client.http.util.Constants))

(let [conn   (wc/connect)
      bucket "accounts"
      key    "novemberain"
      email  "michael@example.com"
      val    {:name "Michael" :age 27 :username key :email email}]
  (wb/create conn bucket)
  ;; store an object
  (kv/store conn bucket key val {:content-type Constants/CTYPE_JSON_UTF8 :indexes {:email #{email}}})
  ;; 2i query
  (kv/index-query conn bucket :email email))

To perform a range query, pass a vector of two range boundaries as the field value:

;; 2i range query
(kv/index-query conn "accounts" :age [23 30])

clojurewerkz.welle.kv/index-query returns a collection of keys as its result. You then use those keys to fetch associated objects.

Wrapping Up

Secondary indexes in Riak are à la carte, just like schema of stored values. They greatly enhance Riak's querying capabilities. Using them with Welle is straightforward, just like it should be.

The documentation is organized as a number of guides, covering all kinds of topics.

We recommend that you read the following guides first, if possible, in this order:

Tell Us What You Think!

Please take a moment to tell us what you think about this guide on Twitter or the Welle mailing list.

Let us know what was unclear or what has not been covered. Maybe you do not like the guide style or grammar or discover spelling mistakes. Reader feedback is key to making the documentation better.

comments powered by Disqus