2017-03-30
If I run bin/rake environment
on Shopify/shopify right now, ruby executes 140737 filesystem-related
syscalls. Many of these are clearly necessary, but many aren’t. Let’s go
through and identify some patterns that could maybe be optimized
out.
I’ve generated the list of syscalls using this
dtrace script and this
shell script (and invoked like dtrace-fs -o out -c 'ruby bin/rake environment').
First off, let’s look at the work of actively loading a ruby source
file once bootsnap is loaded1.
Currently, this reads like:
open /foo.rb
fstat64 /foo.rb
close /foo.rb
open /foo.rb
fstat64 /foo.rb
fgetxattr /foo.rb
fgetxattr /foo.rb
close /foo.rb
With the work done here, this will look like:
open /foo.rb
fstat64 /foo.rb
fgetxattr /foo.rb
fgetxattr /foo.rb
close /foo.rb
Since I haven’t found a faster way of performing this work, I’ll
replace each occurrence of this sequence of eight syscalls in the log
with %LOAD /path/to/file.rb.
HIT = %w(open fstat64 close open fstat64 fgetxattr fgetxattr close)
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
skip = 0
ls.each_cons(HIT.size) do |cs|
if skip > 0
skip -= 1
next
end
if cs.map(&:first) == HIT && cs.map(&:last).uniq.size == 1
skip = (HIT.size - 1)
puts "%LOAD\t#{cs[0].last}"
else
puts cs[0].join("\t")
end
end
ls[-(HIT.size - 1)..-1].each { |l| puts l.join("\t") }This brings us to 5901 instances of %LOAD and 93529
other syscalls, for a total of 99430 lines.
Since bootsnap also caches YAML loads, but YAML loading
doesn’t go through ruby’s file loading mechanism, the pattern looks
different:
open /foo.yml
fstat64 /foo.yml
fgetxattr /foo.yml
fgetxattr /foo.yml
close /foo.yml
Again, this isn’t going to get a lot faster and it’s a known quantity, so let’s replace it too:
HIT = %w(open fstat64 close open fstat64 fgetxattr fgetxattr close)
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
skip = 0
ls.each_cons(HIT.size) do |cs|
if skip > 0
skip -= 1
next
end
if cs.map(&:first) == HIT && cs.map(&:last).uniq.size == 1
skip = (HIT.size - 1)
puts "%YAML\t#{cs[0].last}"
else
puts cs[0].join("\t")
end
end
ls[-(HIT.size - 1)..-1].each { |l| puts l.join("\t") }This results in 753 instances of %YAML and a total of
96418 lines.
rb_realpath_internalRuby has a nasty habit of cascading a series of lstat64
calls down the directory hierarchy each time a file is loaded. We can
see this here:
close /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems/exceptions.rb
lstat64 /opt
lstat64 /opt/rubies
lstat64 /opt/rubies/2.3.3
lstat64 /opt/rubies/2.3.3/lib
lstat64 /opt/rubies/2.3.3/lib/ruby
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems/exceptions.rb
Ruby is descending the hierarchy one call at a time, testing if each
file is a symlink, right after reading the file contents into a buffer.
Conveniently, this path is not taken when RubyVM::InstructionSequence.load_iseq
returns a successful result, which it does in essentially all cases
after bootsnap is initialized.
Regardless, bootsnap isn’t initialized before everything
else, so we still see a number of these. Let’s replace them with
%RRPI for rb_realpath_internal:
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
prev = nil
one_path_component = ->(p) { !!(p =~ %r{^/[^/]+$}) }
flush = ->() {
if prev
puts "%RRPI\t#{prev}"
prev = nil
end
}
not_lstat = ->(l) {
flush.()
puts l.join("\t")
}
lstat = ->(l) {
if prev
if l[1].start_with?(prev) && one_path_component.(l[1][prev.size..-1])
prev = l[1]
else # lstat, but not part of this block.
flush.()
lstat.(l)
end
else
if one_path_component.(l[1])
prev = l[1]
else
not_lstat.(l)
end
end
}
ls.each { |l| l[0] == 'lstat64' ? lstat.(l) : not_lstat.(l) }This gives us:
%RRPI;lstat64 calls.
Therefore, rb_realpath_internal generates:
lstat64 calls;lstat64 calls;%X markers;One particularly hilarious outcome:
%RRPI /path/to/bundler/gems/some-gem/.git/objects/bf
%RRPI /path/to/bundler/gems/some-gem/.git/objects/c1
%RRPI /path/to/bundler/gems/some-gem/.git/objects/c5
...(108 times)...
$LOAD_PATH scansAnother common source of large sequences of syscalls is scanning the
$LOAD_PATH (or
ActiveSupport::Dependencies.autoload_paths). This generates
a sequence like:
open /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/stringio.rb
open /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/x86_64-darwin16/stringio.rb
open /opt/rubies/2.3.3/lib/ruby/site_ruby/stringio.rb
open /opt/rubies/2.3.3/lib/ruby/vendor_ruby/2.3.0/stringio.rb
open /opt/rubies/2.3.3/lib/ruby/vendor_ruby/2.3.0/x86_64-darwin16/stringio.rb
Though we optimize most of these out using bootscale/bootsnap, there are still some paths that lead to a path scan, and several that occur before bootsnap is initialized.
The identifiable pattern here is multiple calls to open
where the final component of the path stays the same but the prefix
varies. Let’s replace these with %SCAN:
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
prev = nil
count = 0
push = ->(n) {
prev = n
count += 1
}
flush = ->() {
if prev
if count == 1
puts "open\t#{prev}"
else
puts "%SCAN\t#{File.basename(prev)}"
end
prev = nil
count = 0
end
}
not_open = ->(l) {
flush.()
puts l.join("\t")
}
open = ->(l) {
if prev
if l[1].end_with?(File.basename(prev)) && l[1] != prev
push.(l[1])
else
flush.()
open.(l)
end
else
push.(l[1])
end
}
ls.each { |l| l[0] == 'open' ? open.(l) : not_open.(l) }This gives us:
%SCAN;open calls;
therefore, $LOAD_PATH scans still account for:
open syscalls;open calls per $LOAD_PATH
scan;%X markers;statsIt’s fairly apparent looking at the log now that we spend a lot of
time stat-ing files. Of 66171 unexplained syscalls:
stat64;lstat64;fstat64.This totals 58632 of 66171 – 89% of all unexplained syscalls are
stat operations.
Here’s a breakdown of the remaining syscalls:
$ grep -v 'stat64' outp4 \
| grep -v -E '^%' \
| awk -F "\t" '{ a[$1]++ } END { for (c in a) print a[c] "\t" c; }' \
| sort -nr
2719 read
2387 close
2137 open
141 pread
124 write
14 fgetxattr
11 socket
5 mkdir
1 shm_open
Of these:
read, open, and
close will be from loading ruby files before initializing
bootsnap;pread will be from loading native
extensions;stat64 path scansI see a pattern where we’re doing this across the ActiveSupport::Dependencies.autoload_paths:
stat64 (some gem)/app/assets/teaspoon/bundle/bundle_helper.rb
stat64 (some gem)/app/controllers/teaspoon/bundle/bundle_helper.rb
stat64 (some gem)app/assets/teaspoon/bundle/bundle_helper.rb
Let’s replace these scans with %ASDS for “ActiveSupport
Dependencies Scan”:
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
prev = nil
count = 0
push = ->(n) {
prev = n
count += 1
}
flush = ->() {
if prev
if count == 1
puts "stat64\t#{prev}"
else
puts "%ASDS\t#{File.basename(prev)}\t#{count}"
end
prev = nil
count = 0
end
}
not_stat64 = ->(l) {
flush.()
puts l.join("\t")
}
stat64 = ->(l) {
if prev
if File.basename(l[1]) == File.basename(prev) && l[1] != prev
push.(l[1])
else
flush.()
stat64.(l)
end
else
push.(l[1])
end
}
ls.each { |l| l[0] == 'stat64' ? stat64.(l) : not_stat64.(l) }This gives us:
%ASDS;stat64
calls; therefore, these scans account for:
stat64 syscalls;stat64 calls per scan (why so
low, on a 500+ item path?);stat64 calls.%X markers;It seems puzzling that, on a 500+ item path, the average number of
calls is only 10. Apparently my assumption that this is all ActiveSupport::Dependencies.autoload_paths
was not correct. Let’s dig in to this more. A histogram of the number of
items scanned for each occurrence:
grep '%ASDS' out \
| awk '{ a[$3] += 1 } END { for (c in a) print c, "\t", a[c] }' \
| sort -nSo really, the number of full scans of ActiveSupport::Dependencies.autoload_paths
is quite small (fewer than ten). The majority of these seem to involve
fewer than ten syscalls.
There are clearly two problems here:
stat’ed twice for no apparent
reason.stat64+lstat64An extremely common pattern is to stat64 and then
immediately lstat64 a file. This feels wasteful2. Let’s label these occurrences as
%STLS:
ls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
stat = nil
flush = ->() {
if stat
puts "stat64\t#{stat}"
stat = nil
end
}
ls.each do |l|
if l[0] == 'stat64'
flush.()
stat = l[1]
next
end
if l[0] == 'lstat64' && l[1] == stat
puts "%STLS\t#{l[1]}"
stat = nil
next
end
flush.()
puts l.join("\t")
endThis gives us:
%STLS;stat64
calls; therefore, these scans account for:
lstat64 calls.lstat64 calls if
fixed;*stat64 calls if
fixed.%X markers;stat’ing
gemsOf the remaining stat64 calls, a majority appear to
iterating through the contents of a gem. The pattern appears like
this:
stat64 CHANGELOG
stat64 CONTRIBUTORS
stat64 MIT-LICENSE
stat64 README.md
stat64 README.md
stat64 lib/active_merchant
stat64 lib/active_merchant.rb
stat64 lib/active_merchant/billing
stat64 lib/active_merchant/billing.rb
This is a little harder to detect since ruby chdirs into
the gem before scanning and I didn’t bother to capture the directory in
my output. Still, we can probably do a reasonable heuristic job.
We’re going to reorder our replacements though, because this will
capture some uses of %ASDS and %STLS:
%LOAD%YAML%RRPI%SCAN%SGEM (new)%ASDS%STLSls = STDIN.readlines.map { |l| l.chomp.split(/\t/) }
in_stat_chunk = false
in_stat_chunk_which_is_a_gem = false
past_toplevel_within_gem = false
buf_lines = []
flush = ->() {
if buf_lines.size > 8
puts "%SGEM\t(some-gem)\t#{buf_lines.size}"
else
buf_lines.each do |l|
puts l.join("\t")
end
end
in_stat_chunk = false
in_stat_chunk_which_is_a_gem = false
past_toplevel_within_gem = false
buf_lines = []
}
push = ->(l) {
buf_lines << l
}
EXCLUDES = %w(config csv fonts images javascsripts stylesheets svg)
handle_stat = ->(l) {
at_toplevel = l[1].count('/').zero?
if !in_stat_chunk
in_stat_chunk = true
in_stat_chunk_which_is_a_gem = at_toplevel && !EXCLUDES.include?(l[1])
end
if !at_toplevel
past_toplevel_within_gem = true
end
# Looks like a new gem?
if past_toplevel_within_gem && at_toplevel
flush.()
return handle_stat.(l)
end
if in_stat_chunk_which_is_a_gem
push.(l)
else
puts l.join("\t")
end
}
ls.each do |l|
unless 'stat64' == l[0] || 'lstat64' == l[0]
flush.()
puts l.join("\t")
next
end
handle_stat.(l)
endHaving not reapplied %ASDS or %STLS yet, we
get:
%SGEM;*stat64 calls;%X markers;Replaying %ASDS and %STLS, we get:
%ASDS;%STLS (interesting that this was
primarily used in gem scans);%X markers;Generating the histogram of %ASDS stat count again, we
get:
The main difference here is that the number of 2s dropped quite a lot. I guess most of those were occurring in those gem scans.
lstat64’ing gemspecs, then 1500
open+read+close-ing them. This
accounts for 11% of unexplained calls.did_you_mean (2%);lstat64 and then
open+read+close all the gemspecs
again after loading bundler (another 11%, totalling 22%);open+fstat64+close+open+fstat64+fstat64+read
+close calls from before initializing bootsnap;%SCAN seems to have maybe eaten the syscall immediately
following it?; 3400 stat64 operations are spent on
regenerating bootsnap’s load path cache for volatile
entries (19%).Excluding the regular ruby require path, which is hard to measure
just with grep and wc, these account for 7850,
or 44% of remaining calls, bringing the number of mystery-calls down to
about 10000, out of a total 140737.
bootsnap is a combination of aot_compile_cache and other work based on bootscale that isn’t public yet but likely will be soon.↩︎
At the very least, we should be able to use
lstat64 first and use S_ISLNK to run the
stat64 if and only if the file is actually a symlink.↩︎