Making Ag Faster: Profiling with Valgrind

These days, a lot of software is written to be “fast enough”. Since code bases can be very large, there’s no such thing as “fast enough” for The Silver Searcher. In fact, my main goal with Ag is speed.

Improving performance is not always easy, but it is simple:

  1. Find the slowest part of the program.
  2. Make that part faster.
  3. Repeat until it’s fast enough or you go insane.

There are lots of profiling tools and programmers often argue about which is the best. I use gprof, callgrind, and Instruments.app. Which profiler you use doesn’t matter as much as actually using one. They all have their advantages and disadvantages, but for this post I’ll only cover Valgrind’s callgrind. Using callgrind doesn’t require special compilation. Just invoke it with your program’s name and it will generate profiling data for callgrind_annotate to analyze.

Here’s a typical profiling run for Ag:

$ make clean && ./build.sh
(snip)
$ time valgrind --tool=callgrind ./ag --literal abcdefghijklmnopqrstuvwxyz ../
(snip)
real	1m34.709s
user	1m33.206s
sys	0m1.492s
$ callgrind_annotate --auto=yes callgrind.out.10361
--------------------------------------------------------------------------------
Profile data file 'callgrind.out.10361' (creator: callgrind-3.6.1-Debian)
--------------------------------------------------------------------------------
I1 cache:
D1 cache:
LL cache:
Timerange: Basic block 0 - 798409857
Trigger: Program termination
Profiled target:  ./ag --literal abcdefghijklmnopqrstuvwxyz ../ (PID 10361, part 1)
Events recorded:  Ir
Events shown:     Ir
Event sort order: Ir
Thresholds:       99
Include dirs:
User annotated:
Auto-annotation:  on

--------------------------------------------------------------------------------
           Ir
--------------------------------------------------------------------------------
3,068,387,924  PROGRAM TOTALS

--------------------------------------------------------------------------------
           Ir  file:function
--------------------------------------------------------------------------------
1,764,541,095  src/util.c:ag_strnstr [/home/geoff/code/the_silver_searcher/ag]
  386,020,821  /build/buildd/eglibc-2.13/posix/fnmatch_loop.c:internal_fnmatch [/lib/x86_64-linux-gnu/libc-2.13.so]
  226,548,868  /build/buildd/eglibc-2.13/string/../sysdeps/x86_64/multiarch/../strcmp.S:__GI_strncmp [/lib/x86_64-linux-gnu/libc-2.13.so]
  181,861,517  src/util.c:is_binary [/home/geoff/code/the_silver_searcher/ag]
  123,211,270  /build/buildd/eglibc-2.13/posix/fnmatch.c:fnmatch@@GLIBC_2.2.5 [/lib/x86_64-linux-gnu/libc-2.13.so]
  104,867,805  src/print.c:print_file_matches [/home/geoff/code/the_silver_searcher/ag]
   77,058,570  /build/buildd/eglibc-2.13/string/../sysdeps/x86_64/multiarch/../strlen.S:__GI_strlen [/lib/x86_64-linux-gnu/libc-2.13.so]
   60,030,629  /build/buildd/eglibc-2.13/posix/fnmatch_loop.c:internal_fnmatch'2 [/lib/x86_64-linux-gnu/libc-2.13.so]
   44,019,376  src/ignore.c:filename_filter [/home/geoff/code/the_silver_searcher/ag]
   27,072,821  /build/buildd/eglibc-2.13/string/../sysdeps/x86_64/memchr.S:memchr [/lib/x86_64-linux-gnu/libc-2.13.so]
    9,329,984  /build/buildd/eglibc-2.13/string/../sysdeps/x86_64/multiarch/../strcmp.S:__GI_strcmp [/lib/x86_64-linux-gnu/libc-2.13.so]
    7,803,075  /build/buildd/eglibc-2.13/malloc/malloc.c:_int_malloc [/lib/x86_64-linux-gnu/libc-2.13.so]
    7,040,644  /build/buildd/eglibc-2.13/posix/../locale/weight.h:internal_fnmatch
    6,062,124  /build/buildd/eglibc-2.13/string/../string/memmove.c:__GI_memmove [/lib/x86_64-linux-gnu/libc-2.13.so]
    4,384,383  /build/buildd/eglibc-2.13/string/../sysdeps/x86_64/multiarch/../memcpy.S:__GI_memcpy [/lib/x86_64-linux-gnu/libc-2.13.so]
    3,951,640  /build/buildd/eglibc-2.13/malloc/malloc.c:_int_free [/lib/x86_64-linux-gnu/libc-2.13.so]
    3,779,300  /build/buildd/eglibc-2.13/dirent/../sysdeps/unix/readdir.c:readdir [/lib/x86_64-linux-gnu/libc-2.13.so]
    3,181,118  /build/buildd/eglibc-2.13/malloc/malloc.c:malloc [/lib/x86_64-linux-gnu/libc-2.13.so]
(snip)

I snipped out the annotated source code. You can see the full output here.

This profiling info tells me that I’m spending all my time in strnstr(). I did some research on string-matching and found out about the Boyer-Moore algorithm. After some more reading, I decided to go with a simplified version of Boyer-Moore called Boyer-Moore-Horspool.

Here’s the data after I implemented Boyer-Moore-Horspool strstr:

$ time valgrind --tool=callgrind ./ag --literal abcdefghijklmnopqrstuvwxyz ../
real	0m32.429s
user	0m31.034s
sys	0m1.324s
$ callgrind_annotate --auto=yes callgrind.out.11921
--------------------------------------------------------------------------------
Profile data file 'callgrind.out.11921' (creator: callgrind-3.6.1-Debian)
--------------------------------------------------------------------------------
I1 cache:
D1 cache:
LL cache:
Timerange: Basic block 0 - 228181262
Trigger: Program termination
Profiled target:  ./ag --literal abcdefghijklmnopqrstuvwxyz ../ (PID 11921, part 1)
Events recorded:  Ir
Events shown:     Ir
Event sort order: Ir
Thresholds:       99
Include dirs:
User annotated:
Auto-annotation:  on

--------------------------------------------------------------------------------
           Ir
--------------------------------------------------------------------------------
1,139,437,344  PROGRAM TOTALS

--------------------------------------------------------------------------------
         Ir  file:function
--------------------------------------------------------------------------------
386,014,011  /build/buildd/eglibc-2.13/posix/fnmatch_loop.c:internal_fnmatch [/lib/x86_64-linux-gnu/libc-2.13.so]
181,870,097  src/util.c:is_binary [/home/geoff/code/the_silver_searcher/ag]
123,209,345  /build/buildd/eglibc-2.13/posix/fnmatch.c:fnmatch@@GLIBC_2.2.5 [/lib/x86_64-linux-gnu/libc-2.13.so]
104,867,805  src/print.c:print_file_matches [/home/geoff/code/the_silver_searcher/ag]
 76,747,163  /build/buildd/eglibc-2.13/string/../sysdeps/x86_64/multiarch/../strlen.S:__GI_strlen [/lib/x86_64-linux-gnu/libc-2.13.so]
 63,421,170  src/util.c:boyer_moore_strnstr [/home/geoff/code/the_silver_searcher/ag]
 60,028,609  /build/buildd/eglibc-2.13/posix/fnmatch_loop.c:internal_fnmatch'2 [/lib/x86_64-linux-gnu/libc-2.13.so]
 44,018,667  src/ignore.c:filename_filter [/home/geoff/code/the_silver_searcher/ag]
 27,072,637  /build/buildd/eglibc-2.13/string/../sysdeps/x86_64/memchr.S:memchr [/lib/x86_64-linux-gnu/libc-2.13.so]
  8,312,570  /build/buildd/eglibc-2.13/string/../sysdeps/x86_64/multiarch/../strcmp.S:__GI_strcmp [/lib/x86_64-linux-gnu/libc-2.13.so]
  7,803,075  /build/buildd/eglibc-2.13/malloc/malloc.c:_int_malloc [/lib/x86_64-linux-gnu/libc-2.13.so]
  7,040,534  /build/buildd/eglibc-2.13/posix/../locale/weight.h:internal_fnmatch
  6,061,868  /build/buildd/eglibc-2.13/string/../string/memmove.c:__GI_memmove [/lib/x86_64-linux-gnu/libc-2.13.so]
  4,384,383  /build/buildd/eglibc-2.13/string/../sysdeps/x86_64/multiarch/../memcpy.S:__GI_memcpy [/lib/x86_64-linux-gnu/libc-2.13.so]
  3,951,640  /build/buildd/eglibc-2.13/malloc/malloc.c:_int_free [/lib/x86_64-linux-gnu/libc-2.13.so]
  3,779,220  /build/buildd/eglibc-2.13/dirent/../sysdeps/unix/readdir.c:readdir [/lib/x86_64-linux-gnu/libc-2.13.so]
  3,181,118  /build/buildd/eglibc-2.13/malloc/malloc.c:malloc [/lib/x86_64-linux-gnu/libc-2.13.so]
  3,089,135  src/main.c:search_dir'2 [/home/geoff/code/the_silver_searcher/ag]
  2,095,514  /build/buildd/eglibc-2.13/malloc/malloc.c:free [/lib/x86_64-linux-gnu/libc-2.13.so]
  2,018,298  /build/buildd/eglibc-2.13/dirent/../sysdeps/wordsize-64/../../dirent/scandir.c:scandir [/lib/x86_64-linux-gnu/libc-2.13.so]
  1,941,992  /build/buildd/eglibc-2.13/string/strcoll_l.c:strcoll_l [/lib/x86_64-linux-gnu/libc-2.13.so]
  1,889,859  /build/buildd/eglibc-2.13/stdlib/msort.c:msort_with_tmp.part.0'2 [/lib/x86_64-linux-gnu/libc-2.13.so]
  1,704,553  /build/buildd/eglibc-2.13/malloc/malloc.c:malloc_consolidate.part.3 [/lib/x86_64-linux-gnu/libc-2.13.so]
  1,644,688  src/ignore.c:ignorefile_filter [/home/geoff/code/the_silver_searcher/ag]
  1,601,628  /build/buildd/eglibc-2.13/dirent/../sysdeps/unix/sysv/linux/getdents.c:__getdents [/lib/x86_64-linux-gnu/libc-2.13.so]
  1,582,620  src/util.c:strlcat [/home/geoff/code/the_silver_searcher/ag]
(snip)

For the curious, full output of callgrind_annotate is here.

That’s a 3x overall speedup and a 27x speedup in string matching. Impressive! Now Ag is spending most of the time figuring out whether or not it should search a file. It’s clear where I need to optimize next.

Valgrind isn’t perfect though. It makes programs run 25-50x slower than they normally would, so you won’t notice if you’re spending all your time waiting for network or disk I/O. In the case of Ag, this turned into a 20% performance improvement in my benchmarks.

Getting more useful data requires switching from an instrumenting profiler to a sampling profiler. Both Instruments.app and gprof are sampling profilers, but this post is already too long. I’ll cover them some other time.

Building for Others

I like to write new code. Unfortunately, writing new code is only a small part of building something useful. The Pareto principle applies. Once you’ve written 80% of the code in a short, fun-filled period, you spend much longer finishing up little things. You have to debug some odd edge case, clean up messy stuff that mostly works, and get it to build and run on some obscure Linux distribution.

Worst of all, you have to write documentation.

This stuff isn’t fun, but it’s necessary if you want others to use your project.

Because you made it, the various dependencies and quirks are obvious to you. For the poor soul who clones your repository, the same is not true. Even compiling is a challenge for a newbie. What build system does your project use? Make? Scons? Ant? What build dependencies does it have? Does it check for them or print out a useful error message if any are missing? Is there a helpful README?

If you want other people to use (and possibly one day improve) your work, you need to polish the build scripts and write documentation. Think of it like a sales funnel. Perhaps 100 people download your code, 80 get it to build, 75 run it, 50 use it regularly, 5 make modifications, and finally, 2 contribute patches back. You can increase the numbers. Making those steps easier will grow your user base and contributions.

So that’s what I’ve done with Ag. Over the past couple of weeks I’ve added a man page, a wiki, accepted pull requests to clean up the build, and even improved the –help output.

I’ll be the first to admit that fixing trivial inconveniences is boring. But frequently, boring work is the difference between a personal project and a community of users. Too often, trivial inconveniences stop projects from reaching critical mass. So if you want to build something for others, grit those teeth and go schlep.

The Silver Searcher: Better than Ack

A lot of my time spent “writing” code is actually spent reading code. And a decent chunk of my time spent reading code is actually spent searching code. Lately I’ve started working with a larger codebase.[1] Both grep and ack take a non-negligible amount of time to search it. Both are slow, but for different reasons. Grep is fast, but doesn’t ignore files.[2] Ack is very good at ignoring files, but it’s written in Perl instead of C. What I really want is something that’s fast and ignores files.

So I built it. I call it The Silver Searcher, or Ag for short. Ag is like ack, but better. It’s fast. It’s damn fast. The only thing faster is stuff that builds indicies beforehand, like Exuberant Ctags.

Don’t believe me? Here are some benchmarks. I ran them multiple times and grabbed the median for each.

ggreer@carbon:~/cloudkick/reach% du -sh
250M	.

ggreer@carbon:~% time grep -r -i SOLR ~/cloudkick/reach | wc -l
     617
11.06s user 0.81s system 96% cpu 12.261 total

ggreer@carbon:~% time ack -i SOLR ~/cloudkick/reach | wc -l
     488
2.87s user 0.78s system 97% cpu 3.750 total

ggreer@carbon:~% time ag -i SOLR ~/cloudkick/reach | wc -l
     573
1.00s user 0.51s system 95% cpu 1.587 total

Here’s Ag with some extra ignores, similar to how ack ignores many files by default:

ggreer@carbon:~% cat ~/cloudkick/reach/.agignore
extern
release
fixtures
ggreer@carbon:~% time ag -i SOLR ~/cloudkick/reach | wc -l
     499
0.35s user 0.15s system 94% cpu 0.528 total

That’s the same as git grep:

ggreer@carbon:~/cloudkick/reach% time git grep -i SOLR ~/cloudkick/reach | wc -l
     489
0.32s user 0.58s system 161% cpu 0.556 total

…except git grep only works in git repos. And it doesn’t ignore stuff in the repository like extern or generated files.[3]

The bottom line: Grep’s output was the least useful. It dutifully reported matches in .pyc files and other things I don’t care about. Ack’s results were better and faster than grep. Ag had more results than ack, but took half as long. With a couple of clever ignores (like the extern directory), Ag took a mere half-second and gave even more pertinent results.

I can already hear someone saying, “Big deal. It’s only a second faster. What does one second matter when searching an entire codebase?” My reply: trivial inconveniences matter. Using Ag is like having a faster computer; you don’t realize how slow things were until you’ve experienced fast. The difference is big enough that I can’t go back to ack, just like ack users can’t go back to grep.

Since it behaves like ack, Ag can be used by many fancy ack GUI front-ends. This makes searching convenient as well as fast. After I got Ag sorta-working, I forked AckMate so that I could use Ag in my favorite editor. My fork bundles both AckMate’s Ack and my own Ag. You can switch between them with a simple check box. The tmbundle is on the downloads page. Be warned: it replaces your current AckMate.

There’s still plenty of stuff I want to add,[4] but it’s good enough for my own daily use so I figured I should tell others about it. And of course, patches are welcome!

 

 

 
Footnotes:

[1] The decision was made to put all python dependencies into extern/ instead of using pip. A good call, in my opinion.

[2] At least not without a bunch of pipes and find and xargs. Yes I know there are aliases but it’s annoying to keep those up-to-date.

[3] Yes, I know it’s bad form to put generated files in revision control.

[4] Ctags support, for one. Also inverted matching, accepting piped input, and basic stuff like retrying a search with fewer ignores and no case-sensitivity.

The Moral Trajectory

Apologies in advance. This post is about philosophy and morality; way out of my area of expertise. You have been warned.

By our standards, ancient Rome was not a nice place. I’m not just talking about problems due to inferior technology or sanitation or disease. Even a hypothetical ancient Rome that fixed those things would be worse than the modern world. Slaves would still exist. Racism and discrimination would still be rampant. The citizenship of one’s parents would still determine your lot in life. Simply put, ancient Roman morality was worse than ours.

It is my claim that over time, most civilizations have moved along a moral trajectory. Slowly -ever so slowly- people have become nicer. We’ve recognized more humans as worthy of moral value. Eventually we decided slavery was wrong. We began to treat women and men equally. Currently we’re moving toward treating homosexuals and heterosexuals equally.

The fact that morals have improved over time should strike people as extraordinary. Think about what must have happened for slavery to go from right to wrong. At some point, a person went from thinking “slavery is right” to thinking “slavery is wrong”, without initially wanting to change his mind! I wish I knew how to trigger that sort of thinking. It’s as valuable as vaccines.

It’s great that people have become more moral over the millennia. Now what are the odds that after thousands of years, we suddenly got morality right? What are the chances that we won’t improve upon our current ideas about right and wrong?

 

 

Yeah, pretty damn small.

 

OK, so if we’re wrong, we should ask the question, “What are some things that future societies might condemn us for?”

Stop here and think about them. I don’t want to contaminate your initial ideas.

 

 

 


 

 

 

There are some obvious ones that are already in political discourse. Recreational drugs. LGBT rights. Border and immigration laws. Religious indoctrination of children. Other things that You Can’t Say… yet.

One day morality will even go beyond those things. What about eating meat? It’s absolutely indefensible that we breed, kill, chop up, and eat animals. It’s not the eating of meat that’s wrong. If we could grow meat in vats, there would be no problem. The problem is that animals experience pain much like we do. While animals are not open to the same range of experiences as humans, they still seem to have some moral value. All else equal, I think we’d prefer that animals not suffer and die.

Future technologies could improve morality as well. Like education and nutrition today, in the future it could be considered abusive to not give your child anti-aging and intelligence-enhancing treatments. And that’s just the start.

Does this future scare you? Well so would the present to those who lived in the past. Eutopia is Scary.

Consume Less Shallow Content

Many of my friends and coworkers own iPads. I do not. I don’t have anything against Apple, in fact I love their products. I agree that the iPad is an engineering marvel, both of hardware and software. It’s elegant. It’s responsive. It’s just plain fun to play with. But I don’t own one.

I don’t own an iPad for the same reason I don’t have a cable subscription: Because it encourages shallow content consumption.

Don’t misunderstand. I don’t think content consumption is bad. I enjoy it and spend lots of time doing it. But there are different ways to consume it, different types to consume, and differing amounts of time one can spend on it.

So what is shallow content? The shallowest content consists of action movies and chick flicks. These are equivalent to sitting in front of a machine that pulls levers in your brain to trigger emotional reactions. Shallow content is easily accessible, but uninformative and in the long run, less rewarding.

Deep content is complex. Profound. Memorable. It is organized on multiple levels. It rewards re-reading (or re-watching). Gödel, Escher, Bach: An Eternal Golden Braid is the prototype of deep content.

Shallow content is cotton candy and Coca-Cola. Deep content is raspberry cheesecake and Riesling.

When it comes to content consumption, different devices encourage different behaviors. For example, a television is the ultimate device for consuming shallow content. You select a channel. The pictures and sounds elicit emotional responses. Changing the channel is easy, so TV shows are selected for stickiness and addictiveness. TV shows can’t be as complex as other media.[1] They have to be accessible to people who start watching in the middle.

Whether or not a device is meant for content consumption is not a binary attribute; it’s more of a sliding scale. A television is solely for consuming. An iPad is more general-purpose, but encourages consumption over production, and shallow content over deep. A laptop or desktop computer is fully general-purpose. A good indicator of a device’s purpose is its input interface. A TV’s input is simple: channel selection. An iPad has a touch-screen that can show a virtual keyboard for occasions when you need to write something. A computer has a physical keyboard, the fastest brain-machine interface currently available.

Of course, even with a computer you can waste your days on Reddit and IRC (consuming and producing shallow content) or you can do something more rewarding.

[1] Also, the cost to make a TV show is much greater than the cost to make a book. So to be profitable, a TV show must appeal to a greater population than a book. This limits the ideas and opinions on television.