At the beginning of this year Bits’n’Bites wrote an article named Faster C++ builds, in which it’s being described how you can accelerate building LLVM using ninja, using a cache etc.
The following excerpt caught my eye:
For most developers, the time it takes to run CMake is not really an issue since you do it very seldom. However, you should be aware that for CI build slaves in particular, CMake can be a real bottleneck.
For instance, when doing a clean re-build of LLVM with a warm CCache, CMake takes roughly 50% of the total build time!
So I decided to build LLVM 4.0.0 (and clang) on my 2011 Core i7 Lenovo W510 laptop and see if I can reproduce his findings.
Ubuntu 16.04 LTS
First I tested on my KDE Neon Ubuntu 16.04 LTS Linux setup. Ubuntu 16.04 comes with GCC 5.4.0, ninja 1.5.1. For cmake I used the upcoming version 3.9.0-rc4 from cmake.org.
Setting up LLVM 4.0.0 was done like this:
Then I configured CMake twice and built target libclang
.
The results of cmake -E time
commands were:
CMake time was 0.54% from all build time.
Then I configured ccache:
And then ran the same procedure (cmake twice, libclang target build) three times. First time to cache all
the object files (cold cache) and the second time to use them (warm cache). Third time was using ld.gold
as linker.
ccache cold:
CMake time was 0.59% from all build time.
ccache warm:
CMake time was 21.81% from all build time. Not quite 50%. As we can see that ccache reduced the CMake time by 25%.
I configured ld.gold
like this:
Then the build time of libclang
target was:
Thus having the CMake time talking 23.52% from the all build time.
Ubuntu 16.04 LTS on Windows 10
I tested the same setup on my Windows 10 in the Linux Bash Shell running Ubuntu 16.04 LTS.
Results of a normal build without ccache:
CMake time was 2.46% from all build time. Compared to running natively cmake was 6x slower.
ccache cold:
CMake time was 2.48% from all build time.
ccache warm:
CMake time was 26.64% from all build time.
ccache warm with ld.gold
CMake time was 27.05% from all build time.
The fastest build on Linux Bash Shell was 5.72x slower than running natively.
MinGW-w64 GCC 5.4.0 on Windows 10
My next attempt was to use the same GCC version build natively for Windows. MSys2 comes with GCC, ccache, ninja. Unfortunately llvm + clang was not compilable. I didn’t try to investigate and fix the problem, instead decided to take the GCC 5.4.0 build from MinGW-w64 repo x86_64-5.4.0-release-posix-seh
My next problem was the fact that I didn’t have ccache anymore. I already knew that ccache is usable on Windows using MinGW and decided to build it.
The following picture describes my feelings after opening the ccache’s source archive:
Instead of giving up I decided write a CMake port for ccache. A few hours later I got it working, code is on github.
I was all set. Results of normal build without cache:
CMake time was 1.62% from all build time, and only 3.14x slower than running on Linux.
Setting up ccache was a bit troublesome. On Linux under /usr/lib/ccache
the symbolic links for g++ work wonderful. On Windows when I tried using mklink
I’ve got ccache complaining about some recursion.
I had to tell CMake to use ccache by using the CMAKE_CXX_COMPILER_LAUNCHER
command line parameter.
ccache cold:
CMake time was 1.30% from all build time.
ccache warm:
CMake time was 30.28% from all build time. Also all the configure checks were not speed up, I think CMAKE_CXX_COMPILER_LAUNCHER
is not taken into consideration in this case.
Setting up ld.gold was done like this:
ccache and ld.gold:
No difference, which makes me think that LLVM CMake code detects ld.gold if present on Windows and uses it automatically. Found out that CMakeCache.txt had the following
variables: GOLD_EXECUTABLE
and LLVM_TOOL_GOLD_BUILD
set to ON
.
Renamed ld.gold.exe to something else, copied ld.bfd.exe as ld.exe and run the build again.
No idea why there was no more significant difference between ld.bfd.exe and ld.gold.exe.
The Windows native cached build was 2.78x slower than the Linux native build, and 2x faster than the Linux build running under Windows 10’s Linux Bash Shell.
CMake Speedup
Now I guess you are wondering about the promised CMake speedup, right?
You have noticed that the second CMake run is almost two times faster than the first one!
CMake for configure checks actually sets up a small project using the given generator (in my case ninja), it tries to compile the project, and based on the compilation result determines if some header, function or symbol is present on the system.
These checks are run sequential, not in parallel, and thus they can take some time.
At some point this year I’ve learned that one can override a CMake function / macro and the original function is accessible under the same name prefixed with an underscore. Daniel Pfeiffer mentions this in his C++Now 2017 Effective CMake talk.
My thought was to override all the checks and cache them for further use.
CMake -C command
pre-loads a script to populate the cache.
So I’ve come up with some code (get it from github ) which can be used like this:
When CMake will do an include(CheckIncludeFile)
it will get my version of CheckIncludeFile.cmake
which will save all findings in cmake_checks_cache.txt
file,
or a different file name which you can set via CMAKE_CHECKS_CACHE_FILE
.
Implementation has a few hacks due to bugs into CMake *.cmake files. For example CheckSymbolExists.cmake
has an implementation macro named _CHECK_SYMBOL_EXISTS
!
Also these macros do not have inclusion guards, which means that my override macro will always be redefined by the actual call of include(Check...)
.
Usage is simple:
First create the CMake checks cache file.
Notice that I used ..
instead of ../llvm-4.0.0.src
, because that’s where I put the three lines CMakeLists.txt
file from above.
Then we just tell CMake to use the checks cache file
LLVM and clang together have 115 configure checks which are no cached!
The results of the runs are now like this:
Ubuntu 16.04 LTS with warm ccache, ld.gold and cmake-checks-cache:
CMake time is 14.89% from all build time. This is down from 23.52%!
Ubuntu 16.04 LTS on Windows 10 with warm ccache, ld.gold and cmake-checks-cache:
CMake time is 17.67% from all build time. This is down from 27.05%!
MinGW-w64 GCC 5.4.0 on Windows 10 with warm ccache, ld.gold and cmake-checks-cache:
CMake time is 20.16% from all build time. This is down from 30.28%!
You may be wondering why the second CMake run is still faster, that’s because CMake still does the initial compiler checks. I had a look at what was needed to do to cache those values, and gave up
Conclusion
If you are using a continuous integration build system (who doesn’t?), and using CMake, you might want to cache all those checks which do not change very often!