Better backtraces in Ruby using TracePoint - ivo's awfully random tech blog

In a previous blog post — Ruby Experiment: Include class names in backtraces — I shared the beginning of my experiments into improving Ruby backtraces.

That got me thinking of a few things:

What information would I want to show in backtraces?
What format would I display that information in?
Is it possible (and reasonable?) to extract the needed information during backtrace collection?

To allow me to iterate faster, I’ve implemented my next prototype for better backtraces using Ruby’s TracePoint API.

This means that the current prototype can be run on regular unmodified Ruby, although having it loaded will have a performance impact (so I definitely don’t recommend it).

The result of my experiments is the following:

tracepoint-better-backtrace.rb:159:in BetterBacktrace.better_backtrace_locations(thread)
tracepoint-better-backtrace.rb:2:in ClassA#hello()
tracepoint-better-backtrace.rb:11:in ModuleB::ClassB#hello()
tracepoint-better-backtrace.rb:18:in ModuleC.hello()
tracepoint-better-backtrace.rb:24:in ClassWithStaticMethod.hello()
tracepoint-better-backtrace.rb:30:in ModuleD#hello()
tracepoint-better-backtrace.rb:39:in Object$singleton#<main>{block}()
tracepoint-better-backtrace.rb:41:in Object$singleton#<main>{block}()
tracepoint-better-backtrace.rb:47:in ClassD$singleton#hello()
tracepoint-better-backtrace.rb:52:in ClassE#hello()
tracepoint-better-backtrace.rb:73:in Method#call(*)
tracepoint-better-backtrace.rb:72:in ClassG$refinement@ContainsRefinement::RefinesClassG#hello()
tracepoint-better-backtrace.rb:82:in ModuleE.hello()
tracepoint-better-backtrace.rb:88:in ClassH#method_missing(name, *_)
tracepoint-better-backtrace.rb:97:in ClassF#hello{block}()
tracepoint-better-backtrace.rb:97:in Integer#times()
tracepoint-better-backtrace.rb:96:in ClassF#hello(arg1, arg2, test1, test2)
tracepoint-better-backtrace.rb:105:in #<Class:#<Object:0x00005636e9ea3898>>.hello()
tracepoint-better-backtrace.rb:110:in #<Class:0x00005636e9ea35c8>#hello()
tracepoint-better-backtrace.rb:116:in #<Module:0x00005636e9ea32d0>.hello()
tracepoint-better-backtrace.rb:121:in Object#method_with_complex_parameters(a, b, *c, (), f:, g:, **h, &i)
tracepoint-better-backtrace.rb:125:in Object#top_level_hello()
tracepoint-better-backtrace.rb:229:in Kernel#eval(*)
tracepoint-better-backtrace.rb:228:in Object$singleton#<main>{block}()
tracepoint-better-backtrace.rb:228:in Integer#times()
tracepoint-better-backtrace.rb:227:in Object$singleton#<main>{block}()
tracepoint-better-backtrace.rb:227:in Integer#times()

The changes are:

Show class/module names
Show number and names of arguments next to method names
Distinguish between instance methods (Foo#bar) and methods defined on class/method objects (Foo.bar). The # vs . syntax follows from the usual conventions in Ruby documentation.
Display methods defined on specific objects with the $singleton suffix
Display refinements as the $refinement suffix, followed by the name of the module that defined the refinement
Show blocks as {block} (instead of block in …)
The eval frame (shown below as (eval):1:in `block (2 levels) in <main>') was omitted — this seems to be a gap in the TracePoint API

It took me quite a while (and a few discussions with friends and colleagues — thanks João and Marek for nice points) but I’m reasonably happy with the current way of encoding all this information in a backtrace.

All of these data points are stored separately in the "backtrace information" so alternative representations (or just not showing this much information) is doable by just changing the printing bits.

My test file is a quite artificial example: the usual Ruby conventions would be to have classes defined in a separate file. But, for reference, here’s how a regular Ruby backtrace looks for the same file:

tracepoint-better-backtrace.rb:5:in `backtrace_locations'
tracepoint-better-backtrace.rb:5:in `hello'
tracepoint-better-backtrace.rb:12:in `hello'
tracepoint-better-backtrace.rb:19:in `hello'
tracepoint-better-backtrace.rb:25:in `hello'
tracepoint-better-backtrace.rb:31:in `hello'
tracepoint-better-backtrace.rb:39:in `block in <main>'
tracepoint-better-backtrace.rb:41:in `block in <main>'
tracepoint-better-backtrace.rb:48:in `hello'
tracepoint-better-backtrace.rb:53:in `hello'
tracepoint-better-backtrace.rb:73:in `call'
tracepoint-better-backtrace.rb:73:in `hello'
tracepoint-better-backtrace.rb:83:in `hello'
tracepoint-better-backtrace.rb:91:in `method_missing'
tracepoint-better-backtrace.rb:98:in `block in hello'
tracepoint-better-backtrace.rb:97:in `times'
tracepoint-better-backtrace.rb:97:in `hello'
tracepoint-better-backtrace.rb:106:in `hello'
tracepoint-better-backtrace.rb:111:in `hello'
tracepoint-better-backtrace.rb:117:in `hello'
tracepoint-better-backtrace.rb:122:in `method_with_complex_parameters'
tracepoint-better-backtrace.rb:126:in `top_level_hello'
(eval):1:in `block (2 levels) in <main>'
tracepoint-better-backtrace.rb:229:in `eval'
tracepoint-better-backtrace.rb:229:in `block (2 levels) in <main>'
tracepoint-better-backtrace.rb:228:in `times'
tracepoint-better-backtrace.rb:228:in `block in <main>'
tracepoint-better-backtrace.rb:227:in `times'
tracepoint-better-backtrace.rb:227:in `<main>'

Below, you’ll find tracepoint-better-backtrace.rb file, which includes the implementation, as well as the most extensive example for defining classes and methods in different ways in Ruby that I could think of :)

The implementation uses a "shadow stack": it keeps a separate stack as a thread-local variable that simulates the real Ruby stack by pushing and popping information on the current active methods as they are entered and they finish. To generate the backtrace, we just need to access the current state of that variable.

The key parts of the implementation are which information is gathered on method entry (inside the TracePoint definition block), as well as how those data points are interpreted to generate the pretty stack (inside to_s for the Locations struct).

What’s next? I’ve been experimenting with implementing these improved backtraces in the rbspy Ruby profiler.

Because the rbspy profiler works outside of the Ruby VM, it can gather the extra needed information without impacting the performance of the Ruby VM, so it seems like a great place to start with a real-world implementation of these better backtraces.