At the beginning of this year Bits’n’Bites wrote an article named Faster C++ builds, in which it’s being described how you can accelerate building LLVM using ninja, using a cache etc.
The following excerpt caught my eye:
For most developers, the time it takes to run CMake is not really an issue since you do it very seldom. However, you should be aware that for CI build slaves in particular, CMake can be a real bottleneck.
For instance, when doing a clean re-build of LLVM with a warm CCache, CMake takes roughly 50% of the total build time!
So I decided to build LLVM 4.0.0 (and clang) on my 2011 Core i7 Lenovo W510 laptop and see if I can reproduce his findings.
Ubuntu 16.04 LTS
First I tested on my KDE Neon Ubuntu 16.04 LTS Linux setup. Ubuntu 16.04 comes with GCC 5.4.0, ninja 1.5.1. For cmake I used the upcoming version 3.9.0-rc4 from cmake.org.
Setting up LLVM 4.0.0 was done like this:
Then I configured CMake twice and built target
The results of
cmake -E time commands were:
CMake time was 0.54% from all build time.
Then I configured ccache:
And then ran the same procedure (cmake twice, libclang target build) three times. First time to cache all
the object files (cold cache) and the second time to use them (warm cache). Third time was using
ld.gold as linker.
CMake time was 0.59% from all build time.
CMake time was 21.81% from all build time. Not quite 50%. As we can see that ccache reduced the CMake time by 25%.
ld.gold like this:
Then the build time of
libclang target was:
Thus having the CMake time talking 23.52% from the all build time.
Ubuntu 16.04 LTS on Windows 10
I tested the same setup on my Windows 10 in the Linux Bash Shell running Ubuntu 16.04 LTS.
Results of a normal build without ccache:
CMake time was 2.46% from all build time. Compared to running natively cmake was 6x slower.
CMake time was 2.48% from all build time.
CMake time was 26.64% from all build time.
ccache warm with ld.gold
CMake time was 27.05% from all build time.
The fastest build on Linux Bash Shell was 5.72x slower than running natively.
MinGW-w64 GCC 5.4.0 on Windows 10
My next attempt was to use the same GCC version build natively for Windows. MSys2 comes with GCC, ccache, ninja. Unfortunately llvm + clang was not compilable. I didn’t try to investigate and fix the problem, instead decided to take the GCC 5.4.0 build from MinGW-w64 repo x86_64-5.4.0-release-posix-seh
My next problem was the fact that I didn’t have ccache anymore. I already knew that ccache is usable on Windows using MinGW and decided to build it.
The following picture describes my feelings after opening the ccache’s source archive:
Instead of giving up I decided write a CMake port for ccache. A few hours later I got it working, code is on github.
I was all set. Results of normal build without cache:
CMake time was 1.62% from all build time, and only 3.14x slower than running on Linux.
Setting up ccache was a bit troublesome. On Linux under
/usr/lib/ccache the symbolic links for g++ work wonderful. On Windows when I tried using
mklink I’ve got ccache complaining about some recursion.
I had to tell CMake to use ccache by using the
CMAKE_CXX_COMPILER_LAUNCHER command line parameter.
CMake time was 1.30% from all build time.
CMake time was 30.28% from all build time. Also all the configure checks were not speed up, I think
CMAKE_CXX_COMPILER_LAUNCHER is not taken into consideration in this case.
Setting up ld.gold was done like this:
ccache and ld.gold:
No difference, which makes me think that LLVM CMake code detects ld.gold if present on Windows and uses it automatically. Found out that CMakeCache.txt had the following
LLVM_TOOL_GOLD_BUILD set to
Renamed ld.gold.exe to something else, copied ld.bfd.exe as ld.exe and run the build again.
No idea why there was no more significant difference between ld.bfd.exe and ld.gold.exe.
The Windows native cached build was 2.78x slower than the Linux native build, and 2x faster than the Linux build running under Windows 10’s Linux Bash Shell.
Now I guess you are wondering about the promised CMake speedup, right?
You have noticed that the second CMake run is almost two times faster than the first one!
CMake for configure checks actually sets up a small project using the given generator (in my case ninja), it tries to compile the project, and based on the compilation result determines if some header, function or symbol is present on the system.
These checks are run sequential, not in parallel, and thus they can take some time.
At some point this year I’ve learned that one can override a CMake function / macro and the original function is accessible under the same name prefixed with an underscore. Daniel Pfeiffer mentions this in his C++Now 2017 Effective CMake talk.
My thought was to override all the checks and cache them for further use.
-C command pre-loads a script to populate the cache.
So I’ve come up with some code (get it from github ) which can be used like this:
When CMake will do an
include(CheckIncludeFile) it will get my version of
CheckIncludeFile.cmake which will save all findings in
or a different file name which you can set via
Implementation has a few hacks due to bugs into CMake *.cmake files. For example
CheckSymbolExists.cmake has an implementation macro named
Also these macros do not have inclusion guards, which means that my override macro will always be redefined by the actual call of
Usage is simple:
First create the CMake checks cache file.
Notice that I used
.. instead of
../llvm-4.0.0.src, because that’s where I put the three lines
CMakeLists.txt file from above.
Then we just tell CMake to use the checks cache file
LLVM and clang together have 115 configure checks which are no cached!
The results of the runs are now like this:
Ubuntu 16.04 LTS with warm ccache, ld.gold and cmake-checks-cache:
CMake time is 14.89% from all build time. This is down from 23.52%!
Ubuntu 16.04 LTS on Windows 10 with warm ccache, ld.gold and cmake-checks-cache:
CMake time is 17.67% from all build time. This is down from 27.05%!
MinGW-w64 GCC 5.4.0 on Windows 10 with warm ccache, ld.gold and cmake-checks-cache:
CMake time is 20.16% from all build time. This is down from 30.28%!
You may be wondering why the second CMake run is still faster, that’s because CMake still does the initial compiler checks. I had a look at what was needed to do to cache those values, and gave up
If you are using a continuous integration build system (who doesn’t?), and using CMake, you might want to cache all those checks which do not change very often!