I’ve written another weird Ruby gem: direct-bind.

It solves a very specific problem I keep running into. It’s oddly specific, so I may actually be alone in this ;)

As I find myself working on Ruby observability tools such as the profiler in the datadog gem, or the "show your Ruby threads in a timeline" gvl-tracing gem or the "build richer stack traces" backtracie gem I often mentally ask "ok, so how do I get X piece of information from Ruby? is there an API for it"?

This thought often leads me to use tools such as readelf to check what functions Ruby exposes as public ("GLOBAL"):

$ readelf -sW libruby.so.3.4.4 | grep GLOBAL | grep "rb_thread_"
   341: 00000000002e5220    82 FUNC    GLOBAL DEFAULT   12 rb_thread_alone
   400: 00000000002e7540   475 FUNC    GLOBAL DEFAULT   12 rb_thread_sleep_deadly
   471: 00000000002e7ff0    11 FUNC    GLOBAL DEFAULT   12 rb_thread_io_blocking_region
   515: 00000000002ea870    56 FUNC    GLOBAL DEFAULT   12 rb_thread_stop
   559: 00000000002e1f00   140 FUNC    GLOBAL DEFAULT   12 rb_thread_lock_native_thread
   582: 00000000002e1cf0   101 FUNC    GLOBAL DEFAULT   12 rb_thread_prevent_fork
   650: 00000000002e5050   139 FUNC    GLOBAL DEFAULT   12 rb_thread_local_aref
   767: 00000000002ea830    25 FUNC    GLOBAL DEFAULT   12 rb_thread_run
   777: 00000000002ea7d0    84 FUNC    GLOBAL DEFAULT   12 rb_thread_schedule

...etc

and to spend a bunch of time reading the Ruby VM sources to find which functions exist and how I can access them.

But sometimes the function I need is right…​ there…​ but Ruby does not expose it as a public function ("LOCAL"):

$ readelf -sW libruby.so.3.4.4 |  grep "rb_thread_alive"
  5585: 00000000002db070    63 FUNC    LOCAL  DEFAULT   12 rb_thread_alive_p

Often, many of these functions are bound to ruby methods (from Ruby’s thread.c):

/*
 *  call-seq:
 *     thr.alive?   -> true or false
 *
 *  Returns +true+ if +thr+ is running or sleeping.
 *
 *     thr = Thread.new { }
 *     thr.join                #=> #<Thread:0x401b3fb0 dead>
 *     Thread.current.alive?   #=> true
 *     thr.alive?              #=> false
 *
 *  See also #stop? and #status.
 */

static VALUE
rb_thread_alive_p(VALUE thread)
{
    return RBOOL(!thread_finished(rb_thread_ptr(thread)));
}


void
Init_Thread(void)
{
    // ...
    rb_define_method(rb_cThread, "alive?", rb_thread_alive_p, 0);
    // ...
}

The only way of calling these functions is to ask Ruby to call the Thread#alive? method on a given thread object, as you would to in regular Ruby code.

But…​ the kind of tools I work on often need to call into the Ruby VM at "inconvenient" times: during a garbage collection cycle; or when a thread does not have the Global VM Lock. In those situations, it’s not possible to ask Ruby to call methods.

For instance, recently I was working on a feature for the gvl-tracing gem where I needed to know if the current thread was still alive, or if it was in the process of terminating. And well, if you try to call Thread#alive? in the middle of a thread terminating, you’ll trigger a crash because Ruby has already cleaned up some of the thread’s state that makes it able to call Ruby methods.

So again in I went into the Ruby VM source code, and found another solution which almost almost worked: checking if the current fiber was alive. There’s a public API for that! But…​ Ruby allocates fiber objects lazily, so if the fiber object didn’t exist then…​ hey here I am crashing the VM because the fiber object can’t be allocated if a thread is terminating.

And all the while that rb_thread_alive_p is exactly what I wanted! So I decided to experiment with something that’s been gnawing at me for a long time — is it possible, and how hard is it to get the pointer to rb_thread_alive_p that rb_define_method received back from Ruby?

The long answer to that question is that it is possible, and not that hard. The short answer is direct-bind ;)

The direct-bind gem takes heavy inspiration from some experiments we’ve done at Datadog, that I also recently discovered is what the debug gem basically does. There’s two key insights to doing what we need: The data we want to get access to lives on the Ruby object heap, and so it’s possible to use the "iterate every object" API rb_objspace_each_objects to locate the objects we want; and the layout of the data structure itself is quite simple, flat, and has not changed (for the purposes of direct-bind) in a long time.

With some C coding, here’s how the resulting API looks:

VALUE (*is_thread_alive)(VALUE thread);

is_thread_alive = direct_bind_get_cfunc_with_arity(rb_cThread, rb_intern("alive?"), 0, true).func;

is_thread_alive(rb_thread_current());

And that’s it! Suddenly I was able to solve my problem of calling into rb_thread_alive_p, even though it’s a public Ruby API.

Because the specific part of the structure where we’re getting this info from hasn’t changed in a long time, direct-bind works for Ruby 2.5 and above, including current Ruby 3.5 master. (It probably would work on older Rubies but…​ I leave that as an experiment for someone else.)

So…​ that’s it! direct-bind is out there in the world, I’m not sure if anyone else other than me is ever going to want to do something like this, but yeah, it was a fun hack!