Ruby Filesystem Calls

2017-03-30



If I run bin/rake environment on Shopify/shopify right now, ruby executes 140737 filesystem-related syscalls. Many of these are clearly necessary, but many aren’t. Let’s go through and identify some patterns that could maybe be optimized out.

I’ve generated the list of syscalls using this dtrace script and this shell script (and invoked like dtrace-fs -o out -c 'ruby bin/rake environment').

Requiring files

First off, let’s look at the work of actively loading a ruby source file once bootsnap is loaded1. Currently, this reads like:

open    /foo.rb
fstat64 /foo.rb
close   /foo.rb
open    /foo.rb
fstat64 /foo.rb
fgetxattr   /foo.rb
fgetxattr   /foo.rb
close   /foo.rb

With the work done here, this will look like:

open    /foo.rb
fstat64 /foo.rb
fgetxattr   /foo.rb
fgetxattr   /foo.rb
close   /foo.rb

Since I haven’t found a faster way of performing this work, I’ll replace each occurrence of this sequence of eight syscalls in the log with %LOAD /path/to/file.rb.

HIT = %w(open fstat64 close open fstat64 fgetxattr fgetxattr close)
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
skip = 0
ls.each_cons(HIT.size) do |cs|
  if skip > 0
    skip -= 1
    next
  end
  if cs.map(&:first) == HIT && cs.map(&:last).uniq.size == 1
    skip = (HIT.size - 1)
    puts "%LOAD\t#{cs[0].last}"
  else
    puts cs[0].join("\t")
  end
end
ls[-(HIT.size - 1)..-1].each { |l| puts l.join("\t") }

This brings us to 5901 instances of %LOAD and 93529 other syscalls, for a total of 99430 lines.

Loading YAML

Since bootsnap also caches YAML loads, but YAML loading doesn’t go through ruby’s file loading mechanism, the pattern looks different:

open    /foo.yml
fstat64 /foo.yml
fgetxattr   /foo.yml
fgetxattr   /foo.yml
close   /foo.yml

Again, this isn’t going to get a lot faster and it’s a known quantity, so let’s replace it too:

HIT = %w(open fstat64 close open fstat64 fgetxattr fgetxattr close)
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
skip = 0
ls.each_cons(HIT.size) do |cs|
  if skip > 0
    skip -= 1
    next
  end
  if cs.map(&:first) == HIT && cs.map(&:last).uniq.size == 1
    skip = (HIT.size - 1)
    puts "%YAML\t#{cs[0].last}"
  else
    puts cs[0].join("\t")
  end
end
ls[-(HIT.size - 1)..-1].each { |l| puts l.join("\t") }

This results in 753 instances of %YAML and a total of 96418 lines.

rb_realpath_internal

Ruby has a nasty habit of cascading a series of lstat64 calls down the directory hierarchy each time a file is loaded. We can see this here:

close   /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems/exceptions.rb
lstat64 /opt
lstat64 /opt/rubies
lstat64 /opt/rubies/2.3.3
lstat64 /opt/rubies/2.3.3/lib
lstat64 /opt/rubies/2.3.3/lib/ruby
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems/exceptions.rb

Ruby is descending the hierarchy one call at a time, testing if each file is a symlink, right after reading the file contents into a buffer. Conveniently, this path is not taken when RubyVM::InstructionSequence.load_iseq returns a successful result, which it does in essentially all cases after bootsnap is initialized.

Regardless, bootsnap isn’t initialized before everything else, so we still see a number of these. Let’s replace them with %RRPI for rb_realpath_internal:

ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
prev = nil

one_path_component = ->(p) { !!(p =~ %r{^/[^/]+$}) }

flush = ->() {
  if prev
    puts "%RRPI\t#{prev}"
    prev = nil
  end
}

not_lstat = ->(l) {
  flush.()
  puts l.join("\t")
}

lstat = ->(l) {
  if prev
    if l[1].start_with?(prev) && one_path_component.(l[1][prev.size..-1])
      prev = l[1]
    else # lstat, but not part of this block.
      flush.()
      lstat.(l)
    end
  else
    if one_path_component.(l[1])
      prev = l[1]
    else
      not_lstat.(l)
    end
  end
}

ls.each { |l| l[0] == 'lstat64' ? lstat.(l) : not_lstat.(l) }

This gives us:

One particularly hilarious outcome:

%RRPI   /path/to/bundler/gems/some-gem/.git/objects/bf
%RRPI   /path/to/bundler/gems/some-gem/.git/objects/c1
%RRPI   /path/to/bundler/gems/some-gem/.git/objects/c5
...(108 times)...

$LOAD_PATH scans

Another common source of large sequences of syscalls is scanning the $LOAD_PATH (or ActiveSupport::Dependencies.autoload_paths). This generates a sequence like:

open    /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/stringio.rb
open    /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/x86_64-darwin16/stringio.rb
open    /opt/rubies/2.3.3/lib/ruby/site_ruby/stringio.rb
open    /opt/rubies/2.3.3/lib/ruby/vendor_ruby/2.3.0/stringio.rb
open    /opt/rubies/2.3.3/lib/ruby/vendor_ruby/2.3.0/x86_64-darwin16/stringio.rb

Though we optimize most of these out using bootscale/bootsnap, there are still some paths that lead to a path scan, and several that occur before bootsnap is initialized.

The identifiable pattern here is multiple calls to open where the final component of the path stays the same but the prefix varies. Let’s replace these with %SCAN:

ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
prev = nil
count = 0

push = ->(n) {
  prev = n
  count += 1
}

flush = ->() {
  if prev
    if count == 1
      puts "open\t#{prev}"
    else
      puts "%SCAN\t#{File.basename(prev)}"
    end
    prev = nil
    count = 0
  end
}

not_open = ->(l) {
  flush.()
  puts l.join("\t")
}

open = ->(l) {
  if prev
    if l[1].end_with?(File.basename(prev)) && l[1] != prev
      push.(l[1])
    else
      flush.()
      open.(l)
    end
  else
    push.(l[1])
  end
}

ls.each { |l| l[0] == 'open' ? open.(l) : not_open.(l) }

This gives us:

Death by a million stats

It’s fairly apparent looking at the log now that we spend a lot of time stat-ing files. Of 66171 unexplained syscalls:

This totals 58632 of 66171 – 89% of all unexplained syscalls are stat operations.

Here’s a breakdown of the remaining syscalls:

$ grep -v 'stat64' outp4 \
  | grep -v -E '^%' \
  | awk -F "\t" '{ a[$1]++ } END { for (c in a) print a[c] "\t" c; }' \
  | sort -nr
2719    read
2387    close
2137    open
141     pread
124     write
14      fgetxattr
11      socket
5       mkdir
1       shm_open

Of these:

stat64 path scans

I see a pattern where we’re doing this across the ActiveSupport::Dependencies.autoload_paths:

stat64  (some gem)/app/assets/teaspoon/bundle/bundle_helper.rb
stat64  (some gem)/app/controllers/teaspoon/bundle/bundle_helper.rb
stat64  (some gem)app/assets/teaspoon/bundle/bundle_helper.rb

Let’s replace these scans with %ASDS for “ActiveSupport Dependencies Scan”:

ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
prev = nil
count = 0

push = ->(n) {
  prev = n
  count += 1
}

flush = ->() {
  if prev
    if count == 1
      puts "stat64\t#{prev}"
    else
      puts "%ASDS\t#{File.basename(prev)}\t#{count}"
    end
    prev = nil
    count = 0
  end
}

not_stat64 = ->(l) {
  flush.()
  puts l.join("\t")
}

stat64 = ->(l) {
  if prev
    if File.basename(l[1]) == File.basename(prev) && l[1] != prev
      push.(l[1])
    else
      flush.()
      stat64.(l)
    end
  else
    push.(l[1])
  end
}

ls.each { |l| l[0] == 'stat64' ? stat64.(l) : not_stat64.(l) }

This gives us:

It seems puzzling that, on a 500+ item path, the average number of calls is only 10. Apparently my assumption that this is all ActiveSupport::Dependencies.autoload_paths was not correct. Let’s dig in to this more. A histogram of the number of items scanned for each occurrence:

grep '%ASDS' out \
  | awk '{ a[$3] += 1 } END { for (c in a) print c, "\t", a[c] }' \
  | sort -n

Gnuplot Produced by GNUPLOT 5.0 patchlevel 6 1 10 100 1000 1 10 100 1000 scans (log) syscalls per scan (log) gnuplot_plot_1

So really, the number of full scans of ActiveSupport::Dependencies.autoload_paths is quite small (fewer than ten). The majority of these seem to involve fewer than ten syscalls.

There are clearly two problems here:

  1. Path scans;
  2. Individual files being stat’ed twice for no apparent reason.

stat64+lstat64

An extremely common pattern is to stat64 and then immediately lstat64 a file. This feels wasteful2. Let’s label these occurrences as %STLS:

ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
stat = nil

flush = ->() {
  if stat
    puts "stat64\t#{stat}"
    stat = nil
  end
}

ls.each do |l|
  if l[0] == 'stat64'
    flush.()
    stat = l[1]
    next
  end

  if l[0] == 'lstat64' && l[1] == stat
    puts "%STLS\t#{l[1]}"
    stat = nil
    next
  end

  flush.()
  puts l.join("\t")
end

This gives us:

recursively stat’ing gems

Of the remaining stat64 calls, a majority appear to iterating through the contents of a gem. The pattern appears like this:

stat64  CHANGELOG
stat64  CONTRIBUTORS
stat64  MIT-LICENSE
stat64  README.md
stat64  README.md
stat64  lib/active_merchant
stat64  lib/active_merchant.rb
stat64  lib/active_merchant/billing
stat64  lib/active_merchant/billing.rb

This is a little harder to detect since ruby chdirs into the gem before scanning and I didn’t bother to capture the directory in my output. Still, we can probably do a reasonable heuristic job.

We’re going to reorder our replacements though, because this will capture some uses of %ASDS and %STLS:

ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }

in_stat_chunk = false
in_stat_chunk_which_is_a_gem = false
past_toplevel_within_gem = false
buf_lines = []

flush = ->() {
  if buf_lines.size > 8
    puts "%SGEM\t(some-gem)\t#{buf_lines.size}"
  else
    buf_lines.each do |l|
      puts l.join("\t")
    end
  end
  in_stat_chunk = false
  in_stat_chunk_which_is_a_gem = false
  past_toplevel_within_gem = false
  buf_lines = []
}

push = ->(l) {
  buf_lines << l
}

EXCLUDES = %w(config csv fonts images javascsripts stylesheets svg)

handle_stat = ->(l) {
  at_toplevel = l[1].count('/').zero?

  if !in_stat_chunk
    in_stat_chunk = true
    in_stat_chunk_which_is_a_gem = at_toplevel && !EXCLUDES.include?(l[1])
  end

  if !at_toplevel
    past_toplevel_within_gem = true
  end

  # Looks like a new gem?
  if past_toplevel_within_gem && at_toplevel
    flush.()
    return handle_stat.(l)
  end

  if in_stat_chunk_which_is_a_gem
    push.(l)
  else
    puts l.join("\t")
  end
}

ls.each do |l|
  unless 'stat64' == l[0] || 'lstat64' == l[0]
    flush.()
    puts l.join("\t")
    next
  end

  handle_stat.(l)
end

Having not reapplied %ASDS or %STLS yet, we get:

Replaying %ASDS and %STLS, we get:

Generating the histogram of %ASDS stat count again, we get:

Gnuplot Produced by GNUPLOT 5.0 patchlevel 6 1 10 100 1000 1 10 100 1000 scans (log) syscalls per scan (log) gnuplot_plot_1

The main difference here is that the number of 2s dropped quite a lot. I guess most of those were occurring in those gem scans.

Miscellaneous Observations

Excluding the regular ruby require path, which is hard to measure just with grep and wc, these account for 7850, or 44% of remaining calls, bringing the number of mystery-calls down to about 10000, out of a total 140737.


  1. bootsnap is a combination of aot_compile_cache and other work based on bootscale that isn’t public yet but likely will be soon.

  2. At the very least, we should be able to use lstat64 first and use S_ISLNK to run the stat64 if and only if the file is actually a symlink.