Associative arrays in Ruby...what?

Have you ever had a bunch of data in an array, but needed to do a key/value lookup like you would with a hash? Fortunately, Ruby provides a mechanism for treating arrays as key-value structures. Let's check it out!

Have you ever had a bunch of data in an array, but needed to do a key/value lookup like you would with a hash?  Fortunately, Ruby provides a mechanism for treating arrays as key-value structures. Let's check it out!

Introducing Array#assoc and Array#rassoc

Imagine that you've been given a magical stock-picking machine. Every few minutes it spits out a recommendation to buy or sell a stock. You've managed to hook it up to your computer, and receiving a stream of data that looks like this:

picks = [
  ["AAPL", "buy"],
  ["GOOG", "sell"],
  ["MSFT", "sell"]
]

To find the most recent guidance for Google, you could make use of the Array#assoc method. Here's what that looks like:

# Returns the first row of data where row[0] == "GOOG"
picks.assoc("GOOG") # => ["GOOG", "sell"]

To find the most recent "sell" recommendation, you could use the Array#rassoc method.

# Returns the first row of data where row[1] == "sell"
picks.rassoc("sell") # => ["GOOG", "sell"]

If no match is found, the methods return nil:

picks.assoc("CSCO") # => nil
picks.rassoc("hold") # => nil

Historical data

Hashes can't have more than one value for a single key. But arrays can have as many duplicates as you like. The assoc and rassoc methods do the sensible thing in this case and return the first matching row they find. This lets us do some pretty interesting things.

Our imaginary stock picking machine provides a stream of data. Eventually, it's going to change its mind about a particular company and tell me to buy what it previously told me to sell. In that case our data looks like:

picks = [
  ["GOOG", "buy"],
  ["AAPL", "sell"],
  ["AAPL", "buy"],
  ["GOOG", "sell"],
  ["MSFT", "sell"]
]

If I were putting all of this data into  a hash, updating the recommendation for a particular stock would cause me to lose any previous recommendations for that stock. Not so with the array. I can keep prepending recommendations to the array, knowing that Array#assoc will always give me the most recent recommendation.

# Returns the first row of data where row[0] == "GOOG"
picks.assoc("GOOG") # => ["GOOG", "buy"]

So we get the key-value goodness of a hash, along with a free audit trail.

More than two columns

Another neat thing about assoc is that you're not limited to just two columns per array. You can have as many columns as you like. Suppose you added a timestamp to each buy/sell recommendation.

picks = [
  ["AAPL", "buy", "2015-08-17 12:11:55 -0700"],
  ["GOOG", "sell", "2015-08-17 12:10:00 -0700"],
  ["MSFT", "sell", "2015-08-17 12:09:00 -0700"]
]

Now when we use assoc or rassoc, we'll  get the timestamp as well:

# The entire row is returned
picks.assoc("GOOG") # => ["GOOG", "sell", "2015-08-17 12:10:00 -0700"]

I hope you can see how useful this could be when dealing with data from CSV and other file formats that can have lots of columns.

Speed

Ruby's hashes will definitely outperform Array#assoc in most benchmarks. As the dataset gets bigger, the differences become more apparent. After all, hash table searches are O(1), while array searches are O(n). However in may cases the difference won't large enough for you to worry about - it depends on the details.

Just for fun, I wrote simple benchmark comparing hash lookup vs assoc for a 10 row dataset and for a 100,000 row dataset. As expected, the hash and array performed similarly with the small data set. With the large dataset, the hash dominated the array.

...though to be fair, I'm searching for the last element in the array, which is the worst case scenario for array searches.

require 'benchmark/ips'
require 'securerandom'

Benchmark.ips do |x|
  x.time = 5
  x.warmup = 2

  short_array = (0..10).map { |i| [SecureRandom.hex(), i] }
  short_hash = Hash[short_array]
  short_key = short_array.last.first

  long_array = (0..100_000).map { |i| [SecureRandom.hex(), i] }
  long_hash = Hash[long_array]
  long_key = short_array.last.first

  x.report("short_array") { short_array.assoc(short_key) }
  x.report("short_hash") { short_hash[short_key] }
  x.report("long_array") { long_array.assoc(long_key) }
  x.report("long_hash") { long_hash[long_key] }

  x.compare!
end


# Calculating -------------------------------------
#          short_array    91.882k i/100ms
#           short_hash   149.430k i/100ms
#           long_array    19.000  i/100ms
#            long_hash   152.086k i/100ms
# -------------------------------------------------
#          short_array      1.828M (± 3.4%) i/s -      9.188M
#           short_hash      6.500M (± 4.8%) i/s -     32.426M
#           long_array    205.416  (± 3.9%) i/s -      1.026k
#            long_hash      6.974M (± 4.2%) i/s -     34.828M

# Comparison:
#            long_hash:  6974073.6 i/s
#           short_hash:  6500207.2 i/s - 1.07x slower
#          short_array:  1827628.6 i/s - 3.82x slower
#           long_array:      205.4 i/s - 33950.98x slower

What to do next:
  1. Try Honeybadger for FREE
    Honeybadger helps you find and fix errors before your users can even report them. Get set up in minutes and check monitoring off your to-do list.
    Start free trial
    Easy 5-minute setup — No credit card required
  2. Get the Honeybadger newsletter
    Each month we share news, best practices, and stories from the DevOps & monitoring community—exclusively for developers like you.
    author photo

    Starr Horne

    Starr Horne is a Rubyist and Chief JavaScripter at Honeybadger.io. When she's not neck-deep in other people's bugs, she enjoys making furniture with traditional hand-tools, reading history and brewing beer in her garage in Seattle.

    More articles by Starr Horne
    Stop wasting time manually checking logs for errors!

    Try the only application health monitoring tool that allows you to track application errors, uptime, and cron jobs in one simple platform.

    • Know when critical errors occur, and which customers are affected.
    • Respond instantly when your systems go down.
    • Improve the health of your systems over time.
    • Fix problems before your customers can report them!

    As developers ourselves, we hated wasting time tracking down errors—so we built the system we always wanted.

    Honeybadger tracks everything you need and nothing you don't, creating one simple solution to keep your application running and error free so you can do what you do best—release new code. Try it free and see for yourself.

    Start free trial
    Simple 5-minute setup — No credit card required

    Learn more

    "We've looked at a lot of error management systems. Honeybadger is head and shoulders above the rest and somehow gets better with every new release."
    — Michael Smith, Cofounder & CTO of YvesBlue

    Honeybadger is trusted by top companies like:

    “Everyone is in love with Honeybadger ... the UI is spot on.”
    Molly Struve, Sr. Site Reliability Engineer, Netflix
    Start free trial