banner
Hyacine🦄

Hyacine🦄

【Translation】Unveiling Ruby ♦️ (1/3): It's All About Threads

Is Ruby a True Parallel Language?!#

Original published on October 20, 2024
Author: Wilfried
Translated on June 15, 2025
Translator: Hyacine🦄 with agents : )
Original link: https://blog.papey.fr/post/07-demystifying-ruby-01/

Ruby is a dynamic, interpreted open-source programming language known for its simplicity, efficiency, and "human-readable" syntax. Ruby is often used for web development, especially in conjunction with the Ruby on Rails framework. It supports object-oriented, functional, and imperative programming paradigms.

The most well-known and widely used Ruby virtual machine is the Matz Ruby Interpreter (also known as CRuby), developed by Ruby's creator Yukihiro Matsumoto (also known as Matz). All other Ruby implementations, such as JRuby, TruffleRuby, etc., are not within the scope of this blog post.

MRI implements a Global Interpreter Lock (GIL), which is a mechanism that ensures only one thread runs at a time, effectively limiting true parallelism. Implicitly, we can understand that Ruby is multithreaded, but the concurrency is limited to 1 (or possibly more 👀).

Many popular Gems, such as Puma, Sidekiq, Rails, Sentry are multithreaded.

Process 💎, Ractor 🦖, Threads 🧵, and Fibers 🌽#

Here we outline all the complex layers of handling concurrency and parallelism in Ruby (yes, they are not the same thing). Let's dive into each one.

Mermaid Loading...

You're in a simulation of a simulation... in another huge simulation! *laughs harder*

By default, all these nested structures exist in the simplest Ruby program you can think of.

I need you to believe me, so here’s a proof:

#!/usr/bin/env ruby

# Print current process ID
puts "Current Process ID: #{Process.pid}"

# Ractor
puts "Current Ractor: #{Ractor.current}"

# Print current thread
puts "Current Thread: #{Thread.current}"

# Print current Fiber
puts "Current Fiber: #{Fiber.current}"
Current Process ID: 6608
Current Ractor: #<Ractor:#1 running>
Current Thread: #<Thread:0x00000001010db270 run>
Current Fiber: #<Fiber:0x00000001012f3ee0 (resumed)>

Every piece of Ruby code runs in a Fiber, which runs in a Thread, which runs in a Ractor, which runs in a Process.

Process 💎#

This is probably the easiest to understand. Your computer is running many processes in parallel, for example: the window manager and web browser you are using are two processes running in parallel.

So to run Ruby processes in parallel, you can open two terminal windows and run a program in each window, just like that (or you can also run fork in the program).

In this case, scheduling is coordinated by the operating system, and the memory between Process A and Process B is isolated (for instance, you wouldn't want Word to access your browser's memory, right?).

If you want to pass data from Process A to Process B, you need inter-process communication mechanisms, such as pipes, queues, sockets, signals, or simpler things like a shared file where one reads and the other writes (be careful of race conditions then!).

Ractor 🦖#

Ractor is a new experimental feature designed to achieve parallel execution within Ruby programs. Managed by the VM (rather than the operating system), Ractor uses native threads under the hood to run in parallel. Each Ractor behaves like an independent virtual machine (VM) within the same Ruby process, with its own isolated memory. "Ractor" stands for "Ruby Actors," and like in the Actor model, Ractors communicate by passing messages without needing shared memory, avoiding the need for Mutexes. Each Ractor has its own GIL, allowing them to run independently without interference from other Ractors.

In summary, Ractor provides a truly parallel model where memory isolation prevents race conditions, and message passing provides a structured and safe way for Ractor interactions, achieving efficient parallel execution in Ruby.

Let's give it a try! [1]

require 'time'

# The `sleep` used here is not actually a CPU-intensive task, but simplifies the example
def cpu_bound_task()
  sleep(2)
end

# Divide the large range into smaller chunks
ranges = [
  (1..25_000),
  (25_001..50_000),
  (50_001..75_000),
  (75_001..100_000)
]

# Start timing
start_time = Time.now

# Create Ractors to calculate the sum with delays in parallel
ractors = ranges.map do |range|
  Ractor.new(range) do |r|
    cpu_bound_task()
    r.sum
  end
end

# Collect results from all Ractors
sum = ractors.sum(&:take)

# End timing
end_time = Time.now

# Calculate and display total execution time
execution_time = end_time - start_time
puts "Total sum: #{sum}"
puts "Parallel Execution time: #{execution_time} seconds"

# Start timing
start_time = Time.now

sum = ranges.sum do |range|
  cpu_bound_task()
  range.sum
end

# End timing
end_time = Time.now

# Calculate and display total execution time
execution_time = end_time - start_time

puts "Total sum: #{sum}"
puts "Sequential Execution time: #{execution_time} seconds"
warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
Total sum: 5000050000
Parallel Execution time: 2.005622 seconds
Total sum: 5000050000
Sequential Execution time: 8.016461 seconds

This is proof of Ractor running in parallel.

As I mentioned earlier, they are quite experimental and are not used in many Gems or code you might see.

Their real purpose is to distribute CPU-intensive tasks across all your CPU cores.

Thread 🧵#

The key difference between operating system threads and Ruby threads lies in how they handle concurrency and resource management. Operating system threads are managed by the OS, allowing them to run in parallel across multiple CPU cores, making them more resource-intensive but capable of true parallelism. In contrast, Ruby threads—especially in MRI Ruby—are managed by the interpreter and are limited by the Global Interpreter Lock (GIL), meaning only one thread can execute Ruby code at a time, limiting them to concurrency rather than true parallelism. This makes Ruby threads lightweight (also known as "green threads"), but they cannot fully utilize multi-core systems (in contrast to Ractor, which allows multiple "Ruby VMs" to run in the same process).

Let's look at this code snippet using threads:

require 'time'

def slow(name, duration)
  puts "#{name} start - #{Time.now.strftime('%H:%M:%S')}"
  sleep(duration)
  puts "#{name} end - #{Time.now.strftime('%H:%M:%S')}"
end

puts 'no threads'
start_time = Time.now
slow('1', 3) 
slow('2', 3) 
puts "total : #{Time.now - start_time}s\n\n"

puts 'threads'
start_time = Time.now
thread1 = Thread.new { slow('1', 3) }
thread2 = Thread.new { slow('2', 3) }
thread1.join
thread2.join
puts "total : #{Time.now - start_time}s\n\n"
no threads
1 start - 08:23:20
1 end - 08:23:23
2 start - 08:23:23
2 end - 08:23:26
total : 6.006063s

threads
1 start - 08:23:26
2 start - 08:23:26
1 end - 08:23:29
2 end - 08:23:29
total : 3.006418s

The Ruby interpreter controls when threads switch, usually after a set number of instructions or when a thread performs a blocking operation (like file I/O or network access). This makes Ruby effective for I/O-bound tasks, even though CPU-bound tasks are still limited by the GIL.

There are some tricks you can use, like the priority attribute to indicate to the interpreter that you want it to prioritize running threads with higher priority, but there is no guarantee that the Ruby VM will comply. If you want to be more aggressive, Thread.pass is available. In practice, using these low-level directives in your code is considered a bad idea.

But why is the GIL needed in the first place? Because MRI's internal structure is not thread-safe! This is specific to MRI; other Ruby implementations like JRuby do not have these limitations.

Finally, don't forget that threads share memory, so this opens the door to race conditions. Here’s an overly complex example to illustrate this. It relies on the fact that class-level variables share the same memory space. Using class variables for anything other than constants is considered bad practice.

# frozen_string_literal: true

class Counter
  # Shared class variable
  @@count = 0

  def self.increment
    1000.times do
      current_value = @@count
      sleep(0.0001)  # Small delay to allow context switching
      @@count = current_value + 1  # Increment count
    end
  end

  def self.count
    @@count
  end
end

# Create an array to hold threads
threads = []

# Create 10 threads, all incrementing the @@count variable
10.times do
  threads << Thread.new do
    Counter.increment
  end
end

# Wait for all threads to finish
threads.each(&:join)

# Display the final value of @@count
puts "Final count: #{Counter.count}"

# Check if the final count matches the expected value
if Counter.count == 10_000
  puts "Final count is correct: #{Counter.count}"
else
  puts "Race condition detected: expected 10000, got #{Counter.count}"
end
Final count: 1000
Race condition detected: expected 10000, got 1000

Here, the sleep forces a context switch to another thread because it is an I/O operation. This leads to the @@count value being reset to the previous value when the context switches back from one thread to another.

In your everyday code, you shouldn't use threads, but it's good to know they exist in the background of most Gems we use daily!

Fiber 🌽#

Here we arrive at the final nested level! Fiber is a very lightweight cooperative concurrency mechanism. Unlike threads, Fibers are not preemptively scheduled; instead, they explicitly yield and resume control. Fiber.new takes a block that you will execute within the Fiber. From there, you can use Fiber.yield and Fiber.resume to control the back-and-forth switching between Fibers. As we saw earlier, Fibers run within the same Ruby thread (so they share the same memory space). Like every other concept emphasized in this blog post, you should view Fibers as a very low-level interface, and I would avoid building a lot of code based on them. For me, the only effective use case is generators. Using Fibers, creating a lazy generator is relatively easy, as shown in the code below.

def fibernnacci
  Fiber.new do
    a, b = 0, 1
    loop do
      Fiber.yield a
      a, b = b, a + b
    end
  end
end

fib = fibernnacci
5.times do
  puts Time.now.to_s
  puts fib.resume
end
2024-10-19 15:58:54 +0200
0
2024-10-19 15:58:54 +0200
1
2024-10-19 15:58:54 +0200
1
2024-10-19 15:58:54 +0200
2
2024-10-19 15:58:54 +0200
3

As you can see in this output, the code lazily generates values only when needed. This allows for interesting patterns and properties in your toolbox.

Again, due to it being a low-level API, using Fibers in your code might not be the best idea. The most well-known Gem that heavily uses Fibers is the Async Gem (used by Falcon).

Summary#

Ruby offers several concurrency models, each with unique characteristics suited for different tasks.

Processes provide complete memory isolation and can run in parallel on CPU cores, making them ideal for resource-intensive tasks that require complete separation.
Ractor, introduced in Ruby 3, also provides parallelism with memory isolation within the same process, achieving safer parallel execution by passing messages between Ractors.
Threads are lighter than processes, share memory within the same process, can run concurrently, but require careful synchronization to avoid race conditions.
Fibers are the lightest concurrency mechanism, providing cooperative multitasking by manually yielding control. They share the same memory and are best suited for building generators or coroutines rather than parallel execution.

With this knowledge, you now have arguments to participate in the endless debate of Puma (thread-first approach) vs. Unicorn (process-first approach). Just remember, discussing this topic is like trying to explain the difference between Vi and Emacs! Figuring out which one is the winner is left as an exercise for the reader! [2]


Footnotes:

  1. Must-read about .map(&) syntax ↩︎

  2. Spoiler: It depends on the situation ↩︎

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.