2017-03-30
If I run bin/rake environment
on Shopify/shopify right now, ruby executes 140737 filesystem-related
syscalls. Many of these are clearly necessary, but many aren’t. Let’s go
through and identify some patterns that could maybe be optimized
out.
I’ve generated the list of syscalls using this
dtrace script and this
shell script (and invoked like dtrace-fs -o out -c 'ruby bin/rake environment'
).
First off, let’s look at the work of actively loading a ruby source
file once bootsnap
is loaded1.
Currently, this reads like:
open /foo.rb
fstat64 /foo.rb
close /foo.rb
open /foo.rb
fstat64 /foo.rb
fgetxattr /foo.rb
fgetxattr /foo.rb
close /foo.rb
With the work done here, this will look like:
open /foo.rb
fstat64 /foo.rb
fgetxattr /foo.rb
fgetxattr /foo.rb
close /foo.rb
Since I haven’t found a faster way of performing this work, I’ll
replace each occurrence of this sequence of eight syscalls in the log
with %LOAD /path/to/file.rb
.
HIT = %w(open fstat64 close open fstat64 fgetxattr fgetxattr close)
= STDIN.readlines.map { |l| l.chomp.split(/\t/) }
ls = 0
skip .each_cons(HIT.size) do |cs|
lsif skip > 0
-= 1
skip next
end
if cs.map(&:first) == HIT && cs.map(&:last).uniq.size == 1
= (HIT.size - 1)
skip puts "%LOAD\t#{cs[0].last}"
else
puts cs[0].join("\t")
end
end
[-(HIT.size - 1)..-1].each { |l| puts l.join("\t") } ls
This brings us to 5901 instances of %LOAD
and 93529
other syscalls, for a total of 99430 lines.
Since bootsnap
also caches YAML loads, but YAML loading
doesn’t go through ruby’s file loading mechanism, the pattern looks
different:
open /foo.yml
fstat64 /foo.yml
fgetxattr /foo.yml
fgetxattr /foo.yml
close /foo.yml
Again, this isn’t going to get a lot faster and it’s a known quantity, so let’s replace it too:
HIT = %w(open fstat64 close open fstat64 fgetxattr fgetxattr close)
= STDIN.readlines.map { |l| l.chomp.split(/\t/) }
ls = 0
skip .each_cons(HIT.size) do |cs|
lsif skip > 0
-= 1
skip next
end
if cs.map(&:first) == HIT && cs.map(&:last).uniq.size == 1
= (HIT.size - 1)
skip puts "%YAML\t#{cs[0].last}"
else
puts cs[0].join("\t")
end
end
[-(HIT.size - 1)..-1].each { |l| puts l.join("\t") } ls
This results in 753 instances of %YAML
and a total of
96418 lines.
rb_realpath_internal
Ruby has a nasty habit of cascading a series of lstat64
calls down the directory hierarchy each time a file is loaded. We can
see this here:
close /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems/exceptions.rb
lstat64 /opt
lstat64 /opt/rubies
lstat64 /opt/rubies/2.3.3
lstat64 /opt/rubies/2.3.3/lib
lstat64 /opt/rubies/2.3.3/lib/ruby
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems
lstat64 /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/rubygems/exceptions.rb
Ruby is descending the hierarchy one call at a time, testing if each
file is a symlink, right after reading the file contents into a buffer.
Conveniently, this path is not taken when RubyVM::InstructionSequence.load_iseq
returns a successful result, which it does in essentially all cases
after bootsnap
is initialized.
Regardless, bootsnap isn’t initialized before everything
else, so we still see a number of these. Let’s replace them with
%RRPI
for rb_realpath_internal
:
= STDIN.readlines.map { |l| l.chomp.split(/\t/) }
ls = nil
prev
= ->(p) { !!(p =~ %r{^/[^/]+$}) }
one_path_component
= ->() {
flush if prev
puts "%RRPI\t#{prev}"
= nil
prev end
}
= ->(l) {
not_lstat .()
flushputs l.join("\t")
}
= ->(l) {
lstat if prev
if l[1].start_with?(prev) && one_path_component.(l[1][prev.size..-1])
= l[1]
prev else # lstat, but not part of this block.
.()
flush.(l)
lstatend
else
if one_path_component.(l[1])
= l[1]
prev else
.(l)
not_lstatend
end
}
.each { |l| l[0] == 'lstat64' ? lstat.(l) : not_lstat.(l) } ls
This gives us:
%RRPI
;lstat64
calls.
Therefore, rb_realpath_internal
generates:
lstat64
calls;lstat64
calls;%X
markers;One particularly hilarious outcome:
%RRPI /path/to/bundler/gems/some-gem/.git/objects/bf
%RRPI /path/to/bundler/gems/some-gem/.git/objects/c1
%RRPI /path/to/bundler/gems/some-gem/.git/objects/c5
...(108 times)...
$LOAD_PATH
scansAnother common source of large sequences of syscalls is scanning the
$LOAD_PATH
(or
ActiveSupport::Dependencies.autoload_paths
). This generates
a sequence like:
open /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/stringio.rb
open /opt/rubies/2.3.3/lib/ruby/site_ruby/2.3.0/x86_64-darwin16/stringio.rb
open /opt/rubies/2.3.3/lib/ruby/site_ruby/stringio.rb
open /opt/rubies/2.3.3/lib/ruby/vendor_ruby/2.3.0/stringio.rb
open /opt/rubies/2.3.3/lib/ruby/vendor_ruby/2.3.0/x86_64-darwin16/stringio.rb
Though we optimize most of these out using bootscale/bootsnap, there are still some paths that lead to a path scan, and several that occur before bootsnap is initialized.
The identifiable pattern here is multiple calls to open
where the final component of the path stays the same but the prefix
varies. Let’s replace these with %SCAN
:
= STDIN.readlines.map { |l| l.chomp.split(/\t/) }
ls = nil
prev = 0
count
= ->(n) {
push = n
prev += 1
count }
= ->() {
flush if prev
if count == 1
puts "open\t#{prev}"
else
puts "%SCAN\t#{File.basename(prev)}"
end
= nil
prev = 0
count end
}
= ->(l) {
not_open .()
flushputs l.join("\t")
}
open = ->(l) {
if prev
if l[1].end_with?(File.basename(prev)) && l[1] != prev
.(l[1])
pushelse
.()
flushopen.(l)
end
else
.(l[1])
pushend
}
.each { |l| l[0] == 'open' ? open.(l) : not_open.(l) } ls
This gives us:
%SCAN
;open
calls;
therefore, $LOAD_PATH
scans still account for:
open
syscalls;open
calls per $LOAD_PATH
scan;%X
markers;stat
sIt’s fairly apparent looking at the log now that we spend a lot of
time stat
-ing files. Of 66171 unexplained syscalls:
stat64
;lstat64
;fstat64
.This totals 58632 of 66171 – 89% of all unexplained syscalls are
stat
operations.
Here’s a breakdown of the remaining syscalls:
$ grep -v 'stat64' outp4 \
| grep -v -E '^%' \
| awk -F "\t" '{ a[$1]++ } END { for (c in a) print a[c] "\t" c; }' \
| sort -nr
2719 read
2387 close
2137 open
141 pread
124 write
14 fgetxattr
11 socket
5 mkdir
1 shm_open
Of these:
read
, open
, and
close
will be from loading ruby files before initializing
bootsnap
;pread
will be from loading native
extensions;stat64
path scansI see a pattern where we’re doing this across the ActiveSupport::Dependencies.autoload_paths
:
stat64 (some gem)/app/assets/teaspoon/bundle/bundle_helper.rb
stat64 (some gem)/app/controllers/teaspoon/bundle/bundle_helper.rb
stat64 (some gem)app/assets/teaspoon/bundle/bundle_helper.rb
Let’s replace these scans with %ASDS
for “ActiveSupport
Dependencies Scan”:
= STDIN.readlines.map { |l| l.chomp.split(/\t/) }
ls = nil
prev = 0
count
= ->(n) {
push = n
prev += 1
count }
= ->() {
flush if prev
if count == 1
puts "stat64\t#{prev}"
else
puts "%ASDS\t#{File.basename(prev)}\t#{count}"
end
= nil
prev = 0
count end
}
= ->(l) {
not_stat64 .()
flushputs l.join("\t")
}
= ->(l) {
stat64 if prev
if File.basename(l[1]) == File.basename(prev) && l[1] != prev
.(l[1])
pushelse
.()
flush.(l)
stat64end
else
.(l[1])
pushend
}
.each { |l| l[0] == 'stat64' ? stat64.(l) : not_stat64.(l) } ls
This gives us:
%ASDS
;stat64
calls; therefore, these scans account for:
stat64
syscalls;stat64
calls per scan (why so
low, on a 500+ item path?);stat64
calls.%X
markers;It seems puzzling that, on a 500+ item path, the average number of
calls is only 10. Apparently my assumption that this is all ActiveSupport::Dependencies.autoload_paths
was not correct. Let’s dig in to this more. A histogram of the number of
items scanned for each occurrence:
grep '%ASDS' out \
| awk '{ a[$3] += 1 } END { for (c in a) print c, "\t", a[c] }' \
| sort -n
So really, the number of full scans of ActiveSupport::Dependencies.autoload_paths
is quite small (fewer than ten). The majority of these seem to involve
fewer than ten syscalls.
There are clearly two problems here:
stat
’ed twice for no apparent
reason.stat64
+lstat64
An extremely common pattern is to stat64
and then
immediately lstat64
a file. This feels wasteful2. Let’s label these occurrences as
%STLS
:
= STDIN.readlines.map { |l| l.chomp.split(/\t/) }
ls = nil
stat
= ->() {
flush if stat
puts "stat64\t#{stat}"
= nil
stat end
}
.each do |l|
lsif l[0] == 'stat64'
.()
flush= l[1]
stat next
end
if l[0] == 'lstat64' && l[1] == stat
puts "%STLS\t#{l[1]}"
= nil
stat next
end
.()
flushputs l.join("\t")
end
This gives us:
%STLS
;stat64
calls; therefore, these scans account for:
lstat64
calls.lstat64
calls if
fixed;*stat64
calls if
fixed.%X
markers;stat
’ing
gemsOf the remaining stat64
calls, a majority appear to
iterating through the contents of a gem. The pattern appears like
this:
stat64 CHANGELOG
stat64 CONTRIBUTORS
stat64 MIT-LICENSE
stat64 README.md
stat64 README.md
stat64 lib/active_merchant
stat64 lib/active_merchant.rb
stat64 lib/active_merchant/billing
stat64 lib/active_merchant/billing.rb
This is a little harder to detect since ruby chdir
s into
the gem before scanning and I didn’t bother to capture the directory in
my output. Still, we can probably do a reasonable heuristic job.
We’re going to reorder our replacements though, because this will
capture some uses of %ASDS
and %STLS
:
%LOAD
%YAML
%RRPI
%SCAN
%SGEM
(new)%ASDS
%STLS
= STDIN.readlines.map { |l| l.chomp.split(/\t/) }
ls
= false
in_stat_chunk = false
in_stat_chunk_which_is_a_gem = false
past_toplevel_within_gem = []
buf_lines
= ->() {
flush if buf_lines.size > 8
puts "%SGEM\t(some-gem)\t#{buf_lines.size}"
else
.each do |l|
buf_linesputs l.join("\t")
end
end
= false
in_stat_chunk = false
in_stat_chunk_which_is_a_gem = false
past_toplevel_within_gem = []
buf_lines }
= ->(l) {
push << l
buf_lines }
EXCLUDES = %w(config csv fonts images javascsripts stylesheets svg)
= ->(l) {
handle_stat = l[1].count('/').zero?
at_toplevel
if !in_stat_chunk
= true
in_stat_chunk = at_toplevel && !EXCLUDES.include?(l[1])
in_stat_chunk_which_is_a_gem end
if !at_toplevel
= true
past_toplevel_within_gem end
# Looks like a new gem?
if past_toplevel_within_gem && at_toplevel
.()
flushreturn handle_stat.(l)
end
if in_stat_chunk_which_is_a_gem
.(l)
pushelse
puts l.join("\t")
end
}
.each do |l|
lsunless 'stat64' == l[0] || 'lstat64' == l[0]
.()
flushputs l.join("\t")
next
end
.(l)
handle_statend
Having not reapplied %ASDS
or %STLS
yet, we
get:
%SGEM
;*stat64
calls;%X
markers;Replaying %ASDS
and %STLS
, we get:
%ASDS
;%STLS
(interesting that this was
primarily used in gem scans);%X
markers;Generating the histogram of %ASDS
stat count again, we
get:
The main difference here is that the number of 2s dropped quite a lot. I guess most of those were occurring in those gem scans.
lstat64
’ing gemspecs, then 1500
open
+read
+close
-ing them. This
accounts for 11% of unexplained calls.did_you_mean
(2%);lstat64
and then
open
+read
+close
all the gemspecs
again after loading bundler (another 11%, totalling 22%);open
+fstat64
+close
+open
+fstat64
+fstat64
+read
+close
calls from before initializing bootsnap;%SCAN
seems to have maybe eaten the syscall immediately
following it?; 3400 stat64
operations are spent on
regenerating bootsnap
’s load path cache for volatile
entries (19%).Excluding the regular ruby require path, which is hard to measure
just with grep
and wc
, these account for 7850,
or 44% of remaining calls, bringing the number of mystery-calls down to
about 10000, out of a total 140737.
bootsnap is a combination of aot_compile_cache and other work based on bootscale that isn’t public yet but likely will be soon.↩︎
At the very least, we should be able to use
lstat64
first and use S_ISLNK
to run the
stat64
if and only if the file is actually a symlink.↩︎