looking into array memory usage in ruby
What’s the memory impact of keeping a number of objects in a Ruby array? To answer this question, I decided to look into how exactly arrays are internally represented by Ruby.
Note that because we’ll be looking at the Ruby internals, these results may and do change between Ruby versions; Nevertheless, the techniques used to measure overheads should be reusable for newer Ruby versions. Note also that the results I present here only apply to the official Ruby "MRI" implementation, not to JRuby or TruffleRuby.
So, let’s dive in!
What’s in a Ruby object?
Ruby objects are represented on your application’s heap memory using a struct RVALUE
. Note that this struct
is what is called a union
— it can actually adopt a number of different layouts for keeping data inside of it, which is taken advantage of by the multiple Ruby built-in data structures. Note that this optimization is only available for parts of the VM implementation written in C, so unfortunately, regular Ruby objects cannot take advantage of these layouts to be more efficient.
Usually on modern machines this struct RVALUE
occupies 40 bytes. We can confirm this using the MRI Ruby-specific API ObjectSpace.memsize_of(obj)
:
require 'objspace'
puts RUBY_DESCRIPTION
puts "Size of new object: #{ObjectSpace.memsize_of(Object.new)}"
## Output:
# ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]
# Size of new object: 40
As expected, a new object takes up 40 bytes. As you may have noticed, I’ll be using Ruby 3.0.0 for the examples.
Let’s talk Arrays
A Ruby array can actually take two different shapes — let’s call them embedded and extended.
Arrays in Ruby are represented internally using a struct RArray
; this is one of the layouts that can be adopted by a struct RVALUE
.
The embedded shape
Because arrays do not need to store a lot of extra metadata (they are quite simple objects), there’s leftover space in the 40 bytes of a struct RVALUE
to store a few array items in the struct itself. This is controlled by the RARRAY_EMBED_LEN_MAX
constant, which for the Ruby releases I checked, is set to 3.
So whenever a new array is created with 0 to 3 elements, the embedded shape which fits within 40 bytes is used:
require 'objspace'
puts "Size of empty array: #{ObjectSpace.memsize_of([])}"
puts "Size of array of size 1: #{ObjectSpace.memsize_of(['a'])}"
puts "Size of array of size 2: #{ObjectSpace.memsize_of(['a', 'b'])}"
puts "Size of array of size 3: #{ObjectSpace.memsize_of(['a', 'b', 'c'])}"
## Output:
# Size of empty array: 40
# Size of array of size 1: 40
# Size of array of size 2: 40
# Size of array of size 3: 40
What happens after the fourth element?
The extended shape
From the fourth element on, the extended shape for the Array is used instead: a separate C array is allocated in memory to keeps pointers to each array element. A Ruby array using this shape will now occupy 40 bytes (for the struct RVALUE
) + number of elements * machine word size (usually 8 bytes — 64 bits).
So for an array with four elements we’ll have: 40 + 4 * 8 = 72 bytes being used, and so on for the next elements:
require 'objspace'
puts "Size of array of size 4: #{ObjectSpace.memsize_of(['a', 'b', 'c', 'd'])}"
puts "Size of array of size 5: #{ObjectSpace.memsize_of(['a', 'b', 'c', 'd', 'e'])}"
## Output:
# Size of array of size 4: 72
# Size of array of size 5: 80
Up until now I’ve been showing how much space is used when an array is pre-created with some number of elements. But does all of this still apply we modify an existing array?
And what does Ruby really do when an array needs to grow or shrink?
Growing an Array
What happens when an array needs to grow to accommodate new elements? As expected, an embedded array will keep on using that shape while it can:
require 'objspace'
example_array = []
puts "Size of example array: #{ObjectSpace.memsize_of(example_array)}"
example_array << 'a'
puts "Size of example array: #{ObjectSpace.memsize_of(example_array)}"
example_array << 'b'
puts "Size of example array: #{ObjectSpace.memsize_of(example_array)}"
example_array << 'c'
puts "Size of example array: #{ObjectSpace.memsize_of(example_array)}"
## Output:
# Size of example array: 40
# Size of example array: 40
# Size of example array: 40
# Size of example array: 40
But you may be surprised to see what happens after we add the next element:
require 'objspace'
example_array = ['a', 'b', 'c']
puts "Size of example array: #{ObjectSpace.memsize_of(example_array)}"
example_array << 'd'
puts "Size of example array: #{ObjectSpace.memsize_of(example_array)}"
## Output:
# Size of example array: 40
# Size of example array: 200
Why 200 bytes and not the 72 bytes we saw above?
This is the Ruby VM doing a trade-off between space (memory) and time (CPU processing): expanding the underlying C array entails creating a new one with a bigger size, copying the existing elements onto this new array, and then finally adding the new element.
Because this is an expensive operation, Ruby tries to avoid doing it on every insertion, and thus instead adds some slack: in this case it jumps from 3 to (200 - 40) / 8 = 20 elements. Thus the trade-off: to avoid doing this expansion on every insertion (reduce CPU processing) we use up more space (memory) than may end up being actually needed.
What if we know in advance how many more elements we want to add?
Unfortunately, an array is only sized-to-fit at creation, so you cannot ask Ruby to resize it to a specific size afterwards — it will always reserve extra space to avoid repeated resizes, and will self-manage the size of the underlying array.
Shrinking an Array
When elements are removed from an array, does it ever shrink? The answer is yes, but again Ruby will try to avoid shrinking/growing the array too often, so it will always leave some slack:
require 'objspace'
example_array = ['a', 'b', 'c']
example_array << 'd'
puts "Size of example array: #{ObjectSpace.memsize_of(example_array)}"
example_array.pop
puts "Size of example array: #{ObjectSpace.memsize_of(example_array)}"
## Output:
# Size of example array: 200
# Size of example array: 104
To force an array to shrink to size, you can use Array#compact
. This returns a new sized-to-fit array:
require 'objspace'
example_array = ['a', 'b', 'c']
example_array << 'd'
puts "Size of example array: #{ObjectSpace.memsize_of(example_array)}"
compact_array = example_array.compact
puts "Size of example array: #{ObjectSpace.memsize_of(compact_array)}"
## Output:
# Size of example array: 200
# Size of example array: 72
That’s it for our tour of Ruby arrays!
Updated 2022-09: Updated post to call shapes embedded/extended rather than inline/regular as the former are commonly used in the Ruby community.