building an always-on (ruby) production profiler

ivo anjo • march 19th, 2025 • 🕘 2 minute read

For the past few years, I’ve been working at Datadog on a new open-source Ruby profiler. This profiler is shipped as part of the datadog Ruby gem.

Why spend all this time and effort on building a new profiler? The key detail is that we want (and need) something that is built to be always-on in production. That translates to having really low overhead in several dimensions, including low cpu usage, low memory usage and low impact on the application’s latency, while also being something that can run for a long time unattended without impacting the application.

Doing this involves a different set of trade-offs than most profilers. For instance, the datadog profiler is able to profile cpu, wall-time, allocations, heap, the Global VM Lock, and Garbage Collection all at the same time, and while staying within the same low-overhead. It also has workarounds for a number of bugs in Ruby and 3rd-party gems, so as to make sure it never impacts any application it is added to.

I’ve had to spent quite a lot of time writing C and Rust to make the Datadog Ruby profiler happen. And it even spawned a few off-shoots I’ve talked about in this blog in the past, such as the gvl-tracing gem to investigate activity around Ruby’s Global VM Lock and the backtracie gem to read stack traces with more detail and lower overhead.

Last year, at the RubyKaigi 2024 conference I had the chance to talk about how Datadog’s Ruby profiler works. It took me almost a year to write this blog post, but here’s the video and slide deck for the talk.

Slide deck:

Ruby performance is seeing a big renaissance: YJIT is getting better and better, JRuby and TruffleRuby continue to provide really strong alternatives, there’s a number of new profilers for Ruby beyond the Datadog profiler (Vernier, pf2), and there’s ongoing work to upgrade Ruby’s garbage collector.

And… I am totally here for it!

talk to me about this post: ivo@ this domain / twitter
interested in my blog? get notified of new posts via email
around/visiting london? would love to meet up!

webrings are making a comeback! i'm all here for it! find more cool ruby content:

other posts @ ivo's awfully random tech blog:

low-level ruby observability apis (jul 12, 2025)
talking ruby gvl, scheduling and performance on the dead code podcast (jul 6, 2025)
native-filenames: find out where native methods are defined in ruby (jul 5, 2025)
i wrote another weird ruby gem: direct-bind (jun 15, 2025)
m:n scheduling and how the (ruby) global vm lock impacts app performance (mar 30, 2025)
building an always-on (ruby) production profiler (mar 19, 2025)
backtracie and the quest for prettier ruby backtraces (mar 7, 2024)
look out! gotchas of using threads in ruby (mar 4, 2024)
understanding the ruby global vm lock by observing it (jul 23, 2023)
ruby’s unexpected i/o vs cpu unfairness (feb 11, 2023)
talking ruby performance tooling at the ruby rogues podcast (jan 24, 2023)
ruby reuses native os threads after ruby threads die (nov 26, 2022)
hunting production memory leaks at rubykaigi 2022 (nov 9, 2022)
tracing ruby’s (global) vm lock (jul 17, 2022)
talking ruby ractors and concurrency at the ruby rogues podcast (jan 22, 2022)
the unexpected cost of ruby’s nomethoderror exception (nov 1, 2021)
sunday lol: embedding images in commit logs (mar 14, 2021)
ruby ractor experiments: safe async communication (feb 14, 2021)
looking into array memory usage in ruby (feb 11, 2021)
what i’ve been reading: december+january 2021 edition (feb 6, 2021)
what i’ve been reading: november 2020 edition (dec 2, 2020)
creating a newsletter! (dec 1, 2020)
what i’ve been reading: october 2020 edition (nov 3, 2020)
better backtraces in ruby using tracepoint (jul 19, 2020)
snippet: getting a dynamically-generated method name on the java stack using javassist (jul 12, 2020)
ruby experiment: include class names in backtraces (jul 5, 2020)
quick tip: unsafe concurrent ruby hash access (aug 26, 2019)
kotlin hack: transparently replace class with interface (aug 10, 2019)
kotlin for rubyists (mar 5, 2019)
spotting unsafe ruby patterns - talk recording (oct 13, 2018)
writing to a java treemap concurrently can lead to an infinite loop during reads (jul 21, 2018)
til: java hides lambda frames in stack traces (jun 21, 2018)
my thoughts on, and how i approach code reviews (apr 8, 2018)
is this ok…? or, spotting unsafe concurrent ruby patterns (jan 31, 2018)
lightning talk - warm-blanket: goodbye crappy after-boot performance (jan 7, 2018)
persistent-💎: a new ruby gem for beautiful immutable data structures (jan 3, 2018)
asciidoc: an awesome markdown alternative (oct 22, 2017)
why i always use attr_reader to access instance variables (sep 20, 2017)
introducing the warmblanket gem (aug 20, 2017)
ninjas’ guide to getting started with visualvm (aug 12, 2017)
adopting tls (may 2, 2017)
rubies: a look at ruby's shiny future (apr 17, 2017)
quickies: heroku exec and deploying jruby (mar 19, 2017)
why you should be using jruby in production (mar 16, 2017)
benchmarking jruby invokedynamic with a production application (jan 29, 2017)
pry-debugger-jruby gem now on rubygems! (dec 2, 2016)
weekend hacking: enviado gem (nov 22, 2016)
jruby's charles nutter on the jvm as a language platform (nov 15, 2016)
peek and pick at mri's heap, part 2 (nov 13, 2016)
finding dead ruby code with debride (oct 29, 2016)
psa: you can now debug with pry on jruby (oct 12, 2016)
peek and pick at mri's heap, part 1 (oct 10, 2016)
why you should regenerate your spec_helper (sep 25, 2016)
explaining git (sep 3, 2016)
another round of jruby goodness (jul 17, 2016)
down the jruby rabbit hole (jul 16, 2016)
did_you_know.ruby? (may 14, 2016)
asciinema: shell recording done right! (dec 1, 2015)
whoa! java has a repl now! (nov 28, 2015)
ruby meets weak memory models (nov 28, 2015)
ruby features i'm looking forward to (nov 13, 2015)
starting a new blog (nov 3, 2015)

...and a link to my homepage :)