i wrote another weird ruby gem: direct-bind
I’ve written another weird Ruby gem: direct-bind
.
It solves a very specific problem I keep running into. It’s oddly specific, so I may actually be alone in this ;)
As I find myself working on Ruby observability tools such as the profiler in the datadog
gem, or the "show your Ruby threads in a timeline" gvl-tracing
gem or the "build richer stack traces" backtracie
gem I often mentally ask "ok, so how do I get X piece of information from Ruby? is there an API for it"?
This thought often leads me to use tools such as readelf
to check what functions Ruby exposes as public ("GLOBAL"):
$ readelf -sW libruby.so.3.4.4 | grep GLOBAL | grep "rb_thread_" 341: 00000000002e5220 82 FUNC GLOBAL DEFAULT 12 rb_thread_alone 400: 00000000002e7540 475 FUNC GLOBAL DEFAULT 12 rb_thread_sleep_deadly 471: 00000000002e7ff0 11 FUNC GLOBAL DEFAULT 12 rb_thread_io_blocking_region 515: 00000000002ea870 56 FUNC GLOBAL DEFAULT 12 rb_thread_stop 559: 00000000002e1f00 140 FUNC GLOBAL DEFAULT 12 rb_thread_lock_native_thread 582: 00000000002e1cf0 101 FUNC GLOBAL DEFAULT 12 rb_thread_prevent_fork 650: 00000000002e5050 139 FUNC GLOBAL DEFAULT 12 rb_thread_local_aref 767: 00000000002ea830 25 FUNC GLOBAL DEFAULT 12 rb_thread_run 777: 00000000002ea7d0 84 FUNC GLOBAL DEFAULT 12 rb_thread_schedule ...etc
and to spend a bunch of time reading the Ruby VM sources to find which functions exist and how I can access them.
But sometimes the function I need is right… there… but Ruby does not expose it as a public function ("LOCAL"):
$ readelf -sW libruby.so.3.4.4 | grep "rb_thread_alive" 5585: 00000000002db070 63 FUNC LOCAL DEFAULT 12 rb_thread_alive_p
Often, many of these functions are bound to ruby methods (from Ruby’s thread.c
):
/*
* call-seq:
* thr.alive? -> true or false
*
* Returns +true+ if +thr+ is running or sleeping.
*
* thr = Thread.new { }
* thr.join #=> #<Thread:0x401b3fb0 dead>
* Thread.current.alive? #=> true
* thr.alive? #=> false
*
* See also #stop? and #status.
*/
static VALUE
rb_thread_alive_p(VALUE thread)
{
return RBOOL(!thread_finished(rb_thread_ptr(thread)));
}
void
Init_Thread(void)
{
// ...
rb_define_method(rb_cThread, "alive?", rb_thread_alive_p, 0);
// ...
}
The only way of calling these functions is to ask Ruby to call the Thread#alive?
method on a given thread object, as you would to in regular Ruby code.
But… the kind of tools I work on often need to call into the Ruby VM at "inconvenient" times: during a garbage collection cycle; or when a thread does not have the Global VM Lock. In those situations, it’s not possible to ask Ruby to call methods.
For instance, recently I was working on a feature for the gvl-tracing
gem where I needed to know if the current thread was still alive, or if it was in the process of terminating. And well, if you try to call Thread#alive?
in the middle of a thread terminating, you’ll trigger a crash because Ruby has already cleaned up some of the thread’s state that makes it able to call Ruby methods.
So again in I went into the Ruby VM source code, and found another solution which almost almost worked: checking if the current fiber was alive. There’s a public API for that! But… Ruby allocates fiber objects lazily, so if the fiber object didn’t exist then… hey here I am crashing the VM because the fiber object can’t be allocated if a thread is terminating.
And all the while that rb_thread_alive_p
is exactly what I wanted! So I decided to experiment with something that’s been gnawing at me for a long time — is it possible, and how hard is it to get the pointer to rb_thread_alive_p
that rb_define_method
received back from Ruby?
The long answer to that question is that it is possible, and not that hard. The short answer is direct-bind
;)
The direct-bind
gem takes heavy inspiration from some experiments we’ve done at Datadog, that I also recently discovered is what the debug
gem basically does. There’s two key insights to doing what we need: The data we want to get access to lives on the Ruby object heap, and so it’s possible to use the "iterate every object" API rb_objspace_each_objects
to locate the objects we want; and the layout of the data structure itself is quite simple, flat, and has not changed (for the purposes of direct-bind
) in a long time.
With some C coding, here’s how the resulting API looks:
VALUE (*is_thread_alive)(VALUE thread);
is_thread_alive = direct_bind_get_cfunc_with_arity(rb_cThread, rb_intern("alive?"), 0, true).func;
is_thread_alive(rb_thread_current());
And that’s it! Suddenly I was able to solve my problem of calling into rb_thread_alive_p
, even though it’s a public Ruby API.
Because the specific part of the structure where we’re getting this info from hasn’t changed in a long time, direct-bind
works for Ruby 2.5 and above, including current Ruby 3.5 master. (It probably would work on older Rubies but… I leave that as an experiment for someone else.)
So… that’s it! direct-bind
is out there in the world, I’m not sure if anyone else other than me is ever going to want to do something like this, but yeah, it was a fun hack!