How Ruby Interprets and Runs Your Programs

In this post we'll follow the journey of a simple program as it's lexed, parsed and compiled into bytecode. We'll use the tools that Ruby gives us to spy on the interpreter every step of the way.

The more you know about your tools, the better decisions you will make as a developer. It's often useful — especially when debugging performance issues — to understand what Ruby is actually doing when it runs your program.

In this post we'll follow the journey of a simple program as it's lexed, parsed and compiled into bytecode. We'll use the tools that Ruby gives us to spy on the interpreter every step of the way.

Don't worry — even if you're not an expert this post should be pretty easy to follow. It's more of a guided tour than a technical manual.

Meet our sample program

As an example, I'm going to use a single if/else statement. To save space, I'll write this using the ternary operator. But don't be fooled, it's just an if/else.

x > 100 ? 'foo' : 'bar'

As you'll see, even a simple program like this gets translated into quite a lot of data as it is processed.

Note: All of the examples in this post were written in Ruby (MRI) 2.2. If you're using other implementations of Ruby, they probably won't work.

Tokenizing

Before the Ruby interpreter can run your program it has to convert it from a somewhat free-form programming language into more structured data.

The first step might be to break the program into chunks. These chunks are called tokens.

# This is a string
"x > 1"

# These are tokens
["x", ">", "1"]

The Ruby standard library provides a module called Ripper that lets us process Ruby code in much the same way as the Ruby interpreter.

In the example below we are using the tokenize method on our Ruby code. As you can see, it returns an array of tokens.

require 'ripper'
Ripper.tokenize("x > 1 ? 'foo' : 'bar'")
# => ["x", " ", ">", " ", "1", " ", "?", " ", "'", "foo", "'", " ", ":", " ", "'", "bar", "'"]

The tokenizer is pretty stupid. You can feed it completely invalid Ruby and it will still tokenize it.

# bad code
Ripper.tokenize("1var @= \/foobar`")
# => ["1", "var"]

Lexing

Lexing is one step beyond tokenization. The string is still broken into tokens, but additional data is added to the tokens.

In the example below we are using Ripper to Lex our small program. as you can see, it's now tagging each token as being an identifier :on_ident, an operator :on_op, an integer :on_int, etc.

require 'ripper'
require 'pp'

pp Ripper.lex("x > 100 ? 'foo' : 'bar'")

# [[[1, 0], :on_ident, "x"],
#  [[1, 1], :on_sp, " "],
#  [[1, 2], :on_op, ">"],
#  [[1, 3], :on_sp, " "],
#  [[1, 4], :on_int, "100"],
#  [[1, 5], :on_sp, " "],
#  [[1, 6], :on_op, "?"],
#  [[1, 7], :on_sp, " "],
#  [[1, 8], :on_tstring_beg, "'"],
#  [[1, 9], :on_tstring_content, "foo"],
#  [[1, 12], :on_tstring_end, "'"],
#  [[1, 13], :on_sp, " "],
#  [[1, 14], :on_op, ":"],
#  [[1, 15], :on_sp, " "],
#  [[1, 16], :on_tstring_beg, "'"],
#  [[1, 17], :on_tstring_content, "bar"],
#  [[1, 20], :on_tstring_end, "'"]]

There is still no real syntax checking going on at this point. The lexer will happily process invalid code.

Parsing

Now that Ruby has broken up the code into more manageable chunks, it's time for parsing to begin.

During the parsing stage, Ruby transforms the text into something called an abstract syntax tree, or AST. The abstract syntax tree is a representation of your program in memory.

You might say that programming languages in general are just more user-friendly ways of describing abstract syntax trees.

require 'ripper'
require 'pp'

pp Ripper.sexp("x > 100 ? 'foo' : 'bar'")

# [:program,
#  [[:ifop,
#    [:binary, [:vcall, [:@ident, "x", [1, 0]]], :>, [:@int, "100", [1, 4]]],
#    [:string_literal, [:string_content, [:@tstring_content, "foo", [1, 11]]]],
#    [:string_literal, [:string_content, [:@tstring_content, "foobar", [1, 19]]]]]]]

It might not be easy to read this output, but if you stare at it for long enough you can kind of see how it maps to the original program.

# Define a progam
[:program,
 # Do an "if" operation
 [[:ifop,
   # Check the conditional (x > 100)
   [:binary, [:vcall, [:@ident, "x", [1, 0]]], :>, [:@int, "100", [1, 4]]],
   # If true, return "foo"
   [:string_literal, [:string_content, [:@tstring_content, "foo", [1, 11]]]],
   # If false, return "bar"
   [:string_literal, [:string_content, [:@tstring_content, "foobar", [1, 19]]]]]]]

At this point, the Ruby interpreter knows exactly what's you want it to do. It could run your program right now. And before Ruby 1.9, it would have. But now, there's one more step.

Compiling to bytecode

Instead of traversing the abstract syntax tree directly, nowadays Ruby compiles the abstract syntax tree into lower-level byte code.

This byte code is then run by the Ruby virtual machine.

We can take a peek into the inner workings of the virtual machine via the RubyVM::InstructionSequence class. In the example below, we compile our sample program and then disassemble it to make a human readable.

puts RubyVM::InstructionSequence.compile("x > 100 ? 'foo' : 'bar'").disassemble
# == disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>==========
# 0000 trace            1                                               (   1)
# 0002 putself
# 0003 opt_send_without_block <callinfo!mid:x, argc:0, FCALL|VCALL|ARGS_SIMPLE>
# 0005 putobject        100
# 0007 opt_gt           <callinfo!mid:>, argc:1, ARGS_SIMPLE>
# 0009 branchunless     15
# 0011 putstring        "foo"
# 0013 leave
# 0014 pop
# 0015 putstring        "bar"
# 0017 leave

Whoa! This suddenly looks a lot more like assembly language than Ruby. Let's step through it and see if we can make sense of it.

# Call the method `x` on self and save the result on the stack
0002 putself
0003 opt_send_without_block <callinfo!mid:x, argc:0, FCALL|VCALL|ARGS_SIMPLE>

# Put the number 100 on the stack
0005 putobject        100

# Do the comparison (x > 100)
0007 opt_gt           <callinfo!mid:>, argc:1, ARGS_SIMPLE>

# If the comparison was false, go to line 15
0009 branchunless     15

# If the comparison was true, return "foo"
0011 putstring        "foo"
0013 leave
0014 pop

# Here's line 15. We jumped here if comparison was false. Return "bar"
0015 putstring        "bar"
0017 leave

The ruby virtual machine (YARV) then steps through these instructions and executes them. That's it!

Conclusion

This ends our very simplified, cartoony tour of the Ruby interpreter. With the tools I've shown you here, it's possible to take a lot of the guesswork out of how Ruby is interpreting your programs. I mean, it doesn't get more concrete than an AST. And next time you're stumped by some weird performance issue, try looking at the bytecode. It probably won't solve your problem, but it might take your mind off of it. :)

What to do next:
  1. Try Honeybadger for FREE
    Honeybadger helps you find and fix errors before your users can even report them. Get set up in minutes and check monitoring off your to-do list.
    Start free trial
    Easy 5-minute setup — No credit card required
  2. Get the Honeybadger newsletter
    Each month we share news, best practices, and stories from the DevOps & monitoring community—exclusively for developers like you.
    author photo

    Starr Horne

    Starr Horne is a Rubyist and Chief JavaScripter at Honeybadger.io. When she's not neck-deep in other people's bugs, she enjoys making furniture with traditional hand-tools, reading history and brewing beer in her garage in Seattle.

    More articles by Starr Horne
    Stop wasting time manually checking logs for errors!

    Try the only application health monitoring tool that allows you to track application errors, uptime, and cron jobs in one simple platform.

    • Know when critical errors occur, and which customers are affected.
    • Respond instantly when your systems go down.
    • Improve the health of your systems over time.
    • Fix problems before your customers can report them!

    As developers ourselves, we hated wasting time tracking down errors—so we built the system we always wanted.

    Honeybadger tracks everything you need and nothing you don't, creating one simple solution to keep your application running and error free so you can do what you do best—release new code. Try it free and see for yourself.

    Start free trial
    Simple 5-minute setup — No credit card required

    Learn more

    "We've looked at a lot of error management systems. Honeybadger is head and shoulders above the rest and somehow gets better with every new release."
    — Michael Smith, Cofounder & CTO of YvesBlue

    Honeybadger is trusted by top companies like:

    “Everyone is in love with Honeybadger ... the UI is spot on.”
    Molly Struve, Sr. Site Reliability Engineer, Netflix
    Start free trial
    Are you using Sentry, Rollbar, Bugsnag, or Airbrake for your monitoring? Honeybadger includes error tracking with a whole suite of amazing monitoring tools — all for probably less than you're paying now. Discover why so many companies are switching to Honeybadger here.
    Start free trial