Ruby’s unexpected I/O vs CPU unfairness - ivo's awfully random tech blog

Recently, while benchmarking a test application, I observed that if I mixed threads doing input/output (such as reading files or network/database requests) and threads doing CPU work, the I/O work would be slowed down massively. It took me a while to realize this was caused by how (and when) Ruby switches between threads.

For instance consider querying a web server (here I’m using google as an example) as fast as possible:

require 'net/http'
require 'benchmark/ips'

def perform_request
  Net::HTTP.start('www.google.com', open_timeout: 0.5, read_timeout: 0.5, write_timeout: 0.5) do |http|
    http.get('/')
  end
end

Benchmark.ips do |x|
  x.config(time: 1, warmup: 0)
  x.report("request") { perform_request }
end

When I run this example on my current wifi, I get:

$ bundle exec ruby io-cpu-unfairness-1.rb
Calculating -------------------------------------
             request     16.571  (±12.1%) i/s -     17.000  in   1.039825s

I get an OK-ish 60ms per request, which means I do a little over 16 such requests in a second.

Now, what happens if I add a background thread that is crunching some data while I’m trying to reach out to google?

require 'net/http'
require 'benchmark/ips'

Thread.new do
  Thread.current.name = 'cpu-heavy'
  foo = 0
  while true
    foo += 1
    foo = 0 if foo > 1_000_000_000_000
  end
end

def perform_request
  Net::HTTP.start('www.google.com', open_timeout: 0.5, read_timeout: 0.5, write_timeout: 0.5) do |http|
    http.get('/')
  end
end

Benchmark.ips do |x|
  x.config(time: 1, warmup: 0)
  x.report("request") { perform_request }
end

$ bundle exec ruby io-cpu-unfairness-2.rb
Calculating -------------------------------------
             request      1.664  (± 0.0%) i/s -      2.000  in   1.201622s

Suddenly, performance takes a big dive and we don’t even get through more than a single request per second.

What’s going on here?

Using the gvl-tracing gem (on Ruby 3.2), we can actually visualize what’s happening.

Compare the thread timelines for both examples:

Figure 1: Timeline for fast example.

Figure 2: Timeline for slow example (with a background thread consuming CPU).

This is in fact caused by Ruby’s Global VM Lock (GVL), and the way Ruby decides when to switch between several threads. Whenever a thread is going to block waiting for some reply, and there are other threads wanting to run, Ruby will switch to them. But if a thread never blocks, then Ruby will give it around 100ms to run before forcing a switch.

This explains the difference between figures 1 and 2: In figure 1, the "network requests" thread spends most of its time waiting on network requests, and if we zoom in (figure 3, below), we see that when a reply is available, the thread immediately wakes up and issues the next network request.

Figure 3: Zooming in on the fast example.

In the figure 2, the "network requests" thread spends most of its time in the ready state — that is, it’s ready to continue working, but it must wait for the cpu-heavy thread to finish its own 100ms.

This behavior ends up being quite unfair for a thread doing I/O work. That thread may only work for a few milliseconds before blocking, but once it blocks it must wait for all other threads to have their own go, which at 100ms per thread can take a long time.

The takeaway here is, be careful when mixing input/output (such as reading files or network/database requests) and CPU work: suddenly your I/O will be slower and it may be hard to understand why. This can happen if you’re using a multi-threaded webserver, such as puma, or a multi-threaded job queue such as sidekiq.

Edit: If you know that some block of Ruby code will be CPU intensive, then as @maciejmensfeld suggested, you can use Thread#pass to tell the Ruby scheduler to give other threads a chance before the 100ms period is up.