2017-03-30
If I run bin/rake environment
on Shopify/shopify right now, ruby executes 140737 filesystem-related
syscalls. Many of these are clearly necessary, but many aren’t. Let’s go
through and identify some patterns that could maybe be optimized
out.
I’ve generated the list of syscalls using this
dtrace script and this
shell script (and invoked like dtrace-fs -o out -c 'ruby bin/rake environment').
First off, let’s look at the work of actively loading a ruby source
file once bootsnap is loaded1.
Currently, this reads like:
open    /foo.rb
fstat64 /foo.rb
close   /foo.rb
open    /foo.rb
fstat64 /foo.rb
fgetxattr   /foo.rb
fgetxattr   /foo.rb
close   /foo.rbWith the work done here, this will look like:
open    /foo.rb
fstat64 /foo.rb
fgetxattr   /foo.rb
fgetxattr   /foo.rb
close   /foo.rbSince I haven’t found a faster way of performing this work, I’ll
replace each occurrence of this sequence of eight syscalls in the log
with %LOAD /path/to/file.rb.
HIT = %w(open fstat64 close open fstat64 fgetxattr fgetxattr close)
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
skip = 0
ls.each_cons(HIT.size) do |cs|
  if skip > 0
    skip -= 1
    next
  end
  if cs.map(&:first) == HIT && cs.map(&:last).uniq.size == 1
    skip = (HIT.size - 1)
    puts "%LOAD\t#{cs[0].last}"
  else
    puts cs[0].join("\t")
  end
end
ls[-(HIT.size - 1)..-1].each { |l| puts l.join("\t") }This brings us to 5901 instances of %LOAD and 93529
other syscalls, for a total of 99430 lines.
Since bootsnap also caches YAML loads, but YAML loading
doesn’t go through ruby’s file loading mechanism, the pattern looks
different:
open    /foo.yml
fstat64 /foo.yml
fgetxattr   /foo.yml
fgetxattr   /foo.yml
close   /foo.ymlAgain, this isn’t going to get a lot faster and it’s a known quantity, so let’s replace it too:
HIT = %w(open fstat64 close open fstat64 fgetxattr fgetxattr close)
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
skip = 0
ls.each_cons(HIT.size) do |cs|
  if skip > 0
    skip -= 1
    next
  end
  if cs.map(&:first) == HIT && cs.map(&:last).uniq.size == 1
    skip = (HIT.size - 1)
    puts "%YAML\t#{cs[0].last}"
  else
    puts cs[0].join("\t")
  end
end
ls[-(HIT.size - 1)..-1].each { |l| puts l.join("\t") }This results in 753 instances of %YAML and a total of
96418 lines.
rb_realpath_internalRuby has a nasty habit of cascading a series of lstat64
calls down the directory hierarchy each time a file is loaded. We can
see this here:
close   /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems/exceptions.rb
lstat64 /opt
lstat64 /opt/rubies
lstat64 /opt/rubies/2.3.3
lstat64 /opt/rubies/2.3.3/lib
lstat64 /opt/rubies/2.3.3/lib/ruby
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems/exceptions.rbRuby is descending the hierarchy one call at a time, testing if each
file is a symlink, right after reading the file contents into a buffer.
Conveniently, this path is not taken when RubyVM::InstructionSequence.load_iseq
returns a successful result, which it does in essentially all cases
after bootsnap is initialized.
Regardless, bootsnap isn’t initialized before everything
else, so we still see a number of these. Let’s replace them with
%RRPI for rb_realpath_internal:
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
prev = nil
one_path_component = ->(p) { !!(p =~ %r{^/[^/]+$}) }
flush = ->() {
  if prev
    puts "%RRPI\t#{prev}"
    prev = nil
  end
}
not_lstat = ->(l) {
  flush.()
  puts l.join("\t")
}
lstat = ->(l) {
  if prev
    if l[1].start_with?(prev) && one_path_component.(l[1][prev.size..-1])
      prev = l[1]
    else # lstat, but not part of this block.
      flush.()
      lstat.(l)
    end
  else
    if one_path_component.(l[1])
      prev = l[1]
    else
      not_lstat.(l)
    end
  end
}
ls.each { |l| l[0] == 'lstat64' ? lstat.(l) : not_lstat.(l) }This gives us:
%RRPI;lstat64 calls.
Therefore, rb_realpath_internal generates:
lstat64 calls;lstat64 calls;%X markers;One particularly hilarious outcome:
%RRPI   /path/to/bundler/gems/some-gem/.git/objects/bf
%RRPI   /path/to/bundler/gems/some-gem/.git/objects/c1
%RRPI   /path/to/bundler/gems/some-gem/.git/objects/c5
...(108 times)...$LOAD_PATH scansAnother common source of large sequences of syscalls is scanning the
$LOAD_PATH (or
ActiveSupport::Dependencies.autoload_paths). This generates
a sequence like:
open    /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/stringio.rb
open    /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/x86_64-darwin16/stringio.rb
open    /opt/rubies/2.3.3/lib/ruby/site_ruby/stringio.rb
open    /opt/rubies/2.3.3/lib/ruby/vendor_ruby/2.3.0/stringio.rb
open    /opt/rubies/2.3.3/lib/ruby/vendor_ruby/2.3.0/x86_64-darwin16/stringio.rbThough we optimize most of these out using bootscale/bootsnap, there are still some paths that lead to a path scan, and several that occur before bootsnap is initialized.
The identifiable pattern here is multiple calls to open
where the final component of the path stays the same but the prefix
varies. Let’s replace these with %SCAN:
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
prev = nil
count = 0
push = ->(n) {
  prev = n
  count += 1
}
flush = ->() {
  if prev
    if count == 1
      puts "open\t#{prev}"
    else
      puts "%SCAN\t#{File.basename(prev)}"
    end
    prev = nil
    count = 0
  end
}
not_open = ->(l) {
  flush.()
  puts l.join("\t")
}
open = ->(l) {
  if prev
    if l[1].end_with?(File.basename(prev)) && l[1] != prev
      push.(l[1])
    else
      flush.()
      open.(l)
    end
  else
    push.(l[1])
  end
}
ls.each { |l| l[0] == 'open' ? open.(l) : not_open.(l) }This gives us:
%SCAN;open calls;
therefore, $LOAD_PATH scans still account for:
open syscalls;open calls per $LOAD_PATH
scan;%X markers;statsIt’s fairly apparent looking at the log now that we spend a lot of
time stat-ing files. Of 66171 unexplained syscalls:
stat64;lstat64;fstat64.This totals 58632 of 66171 – 89% of all unexplained syscalls are
stat operations.
Here’s a breakdown of the remaining syscalls:
$ grep -v 'stat64' outp4 \
  | grep -v -E '^%' \
  | awk -F "\t" '{ a[$1]++ } END { for (c in a) print a[c] "\t" c; }' \
  | sort -nr
2719    read
2387    close
2137    open
141     pread
124     write
14      fgetxattr
11      socket
5       mkdir
1       shm_openOf these:
read, open, and
close will be from loading ruby files before initializing
bootsnap;pread will be from loading native
extensions;stat64 path scansI see a pattern where we’re doing this across the ActiveSupport::Dependencies.autoload_paths:
stat64  (some gem)/app/assets/teaspoon/bundle/bundle_helper.rb
stat64  (some gem)/app/controllers/teaspoon/bundle/bundle_helper.rb
stat64  (some gem)app/assets/teaspoon/bundle/bundle_helper.rbLet’s replace these scans with %ASDS for “ActiveSupport
Dependencies Scan”:
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
prev = nil
count = 0
push = ->(n) {
  prev = n
  count += 1
}
flush = ->() {
  if prev
    if count == 1
      puts "stat64\t#{prev}"
    else
      puts "%ASDS\t#{File.basename(prev)}\t#{count}"
    end
    prev = nil
    count = 0
  end
}
not_stat64 = ->(l) {
  flush.()
  puts l.join("\t")
}
stat64 = ->(l) {
  if prev
    if File.basename(l[1]) == File.basename(prev) && l[1] != prev
      push.(l[1])
    else
      flush.()
      stat64.(l)
    end
  else
    push.(l[1])
  end
}
ls.each { |l| l[0] == 'stat64' ? stat64.(l) : not_stat64.(l) }This gives us:
%ASDS;stat64
calls; therefore, these scans account for:
stat64 syscalls;stat64 calls per scan (why so
low, on a 500+ item path?);stat64 calls.%X markers;It seems puzzling that, on a 500+ item path, the average number of
calls is only 10. Apparently my assumption that this is all ActiveSupport::Dependencies.autoload_paths
was not correct. Let’s dig in to this more. A histogram of the number of
items scanned for each occurrence:
grep '%ASDS' out \
  | awk '{ a[$3] += 1 } END { for (c in a) print c, "\t", a[c] }' \
  | sort -nSo really, the number of full scans of ActiveSupport::Dependencies.autoload_paths
is quite small (fewer than ten). The majority of these seem to involve
fewer than ten syscalls.
There are clearly two problems here:
stat’ed twice for no apparent
reason.stat64+lstat64An extremely common pattern is to stat64 and then
immediately lstat64 a file. This feels wasteful2. Let’s label these occurrences as
%STLS:
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
stat = nil
flush = ->() {
  if stat
    puts "stat64\t#{stat}"
    stat = nil
  end
}
ls.each do |l|
  if l[0] == 'stat64'
    flush.()
    stat = l[1]
    next
  end
  if l[0] == 'lstat64' && l[1] == stat
    puts "%STLS\t#{l[1]}"
    stat = nil
    next
  end
  flush.()
  puts l.join("\t")
endThis gives us:
%STLS;stat64
calls; therefore, these scans account for:
lstat64 calls.lstat64 calls if
fixed;*stat64 calls if
fixed.%X markers;stat’ing
gemsOf the remaining stat64 calls, a majority appear to
iterating through the contents of a gem. The pattern appears like
this:
stat64  CHANGELOG
stat64  CONTRIBUTORS
stat64  MIT-LICENSE
stat64  README.md
stat64  README.md
stat64  lib/active_merchant
stat64  lib/active_merchant.rb
stat64  lib/active_merchant/billing
stat64  lib/active_merchant/billing.rbThis is a little harder to detect since ruby chdirs into
the gem before scanning and I didn’t bother to capture the directory in
my output. Still, we can probably do a reasonable heuristic job.
We’re going to reorder our replacements though, because this will
capture some uses of %ASDS and %STLS:
%LOAD%YAML%RRPI%SCAN%SGEM (new)%ASDS%STLSls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
in_stat_chunk = false
in_stat_chunk_which_is_a_gem = false
past_toplevel_within_gem = false
buf_lines = []
flush = ->() {
  if buf_lines.size > 8
    puts "%SGEM\t(some-gem)\t#{buf_lines.size}"
  else
    buf_lines.each do |l|
      puts l.join("\t")
    end
  end
  in_stat_chunk = false
  in_stat_chunk_which_is_a_gem = false
  past_toplevel_within_gem = false
  buf_lines = []
}
push = ->(l) {
  buf_lines << l
}
EXCLUDES = %w(config csv fonts images javascsripts stylesheets svg)
handle_stat = ->(l) {
  at_toplevel = l[1].count('/').zero?
  if !in_stat_chunk
    in_stat_chunk = true
    in_stat_chunk_which_is_a_gem = at_toplevel && !EXCLUDES.include?(l[1])
  end
  if !at_toplevel
    past_toplevel_within_gem = true
  end
  # Looks like a new gem?
  if past_toplevel_within_gem && at_toplevel
    flush.()
    return handle_stat.(l)
  end
  if in_stat_chunk_which_is_a_gem
    push.(l)
  else
    puts l.join("\t")
  end
}
ls.each do |l|
  unless 'stat64' == l[0] || 'lstat64' == l[0]
    flush.()
    puts l.join("\t")
    next
  end
  handle_stat.(l)
endHaving not reapplied %ASDS or %STLS yet, we
get:
%SGEM;*stat64 calls;%X markers;Replaying %ASDS and %STLS, we get:
%ASDS;%STLS (interesting that this was
primarily used in gem scans);%X markers;Generating the histogram of %ASDS stat count again, we
get:
The main difference here is that the number of 2s dropped quite a lot. I guess most of those were occurring in those gem scans.
lstat64’ing gemspecs, then 1500
open+read+close-ing them. This
accounts for 11% of unexplained calls.did_you_mean (2%);lstat64 and then
open+read+close all the gemspecs
again after loading bundler (another 11%, totalling 22%);open+fstat64+close+open+fstat64+fstat64+read
+close calls from before initializing bootsnap;%SCAN seems to have maybe eaten the syscall immediately
following it?; 3400 stat64 operations are spent on
regenerating bootsnap’s load path cache for volatile
entries (19%).Excluding the regular ruby require path, which is hard to measure
just with grep and wc, these account for 7850,
or 44% of remaining calls, bringing the number of mystery-calls down to
about 10000, out of a total 140737.
bootsnap is a combination of aot_compile_cache and other work based on bootscale that isn’t public yet but likely will be soon.↩︎
At the very least, we should be able to use
lstat64 first and use S_ISLNK to run the
stat64 if and only if the file is actually a symlink.↩︎