Speeding up CMake - Cristian Adam

At the beginning of this year Bits’n’Bites wrote an article named Faster C++ builds, in which it’s being described how you can accelerate building LLVM using ninja, using a cache etc.

The following excerpt caught my eye:

For most developers, the time it takes to run CMake is not really an issue since you do it very seldom. However, you should be aware that for CI build slaves in particular, CMake can be a real bottleneck.
For instance, when doing a clean re-build of LLVM with a warm CCache, CMake takes roughly 50% of the total build time!

So I decided to build LLVM 4.0.0 (and clang) on my 2011 Core i7 Lenovo W510 laptop and see if I can reproduce his findings.

Ubuntu 16.04 LTS

First I tested on my KDE Neon Ubuntu 16.04 LTS Linux setup. Ubuntu 16.04 comes with GCC 5.4.0, ninja 1.5.1. For cmake I used the upcoming version 3.9.0-rc4 from cmake.org.

Setting up LLVM 4.0.0 was done like this:

$ tar xJf llvm-4.0.0.src.tar.xz
$ tar xJf cfe-4.0.0.src.tar.xz
$ mv cfe-4.0.0.src llvm-4.0.0.src/tools/clang

Then I configured CMake twice and built target libclang.

$ mkdir llvm-4.0.0.build
$ cd llvm-4.0.0.build
$ cmake -E time cmake -GNinja ../llvm-4.0.0.src -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/cadam/llvm -DLLVM_TARGETS_TO_BUILD=X86
$ cmake -E time cmake -GNinja ../llvm-4.0.0.src -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/cadam/llvm -DLLVM_TARGETS_TO_BUILD=X86
$ cmake -E time cmake --build . --target libclang

The results of cmake -E time commands were:

Elapsed time: 14 s. (time), 0.016894 s. (clock)
Elapsed time: 6 s. (time), 0.00114 s. (clock)
Elapsed time: 2574 s. (time), 0.069965 s. (clock)

CMake time was 0.54% from all build time.

Then I configured ccache:

export PATH=/usr/lib/ccache:$PATH

And then ran the same procedure (cmake twice, libclang target build) three times. First time to cache all the object files (cold cache) and the second time to use them (warm cache). Third time was using ld.gold as linker.

ccache cold:

Elapsed time: 16 s. (time), 0.015998 s. (clock)
Elapsed time: 6 s. (time), 0.001168 s. (clock)
Elapsed time: 2668 s. (time), 0.07373 s. (clock)

CMake time was 0.59% from all build time.

ccache warm:

Elapsed time: 12 s. (time), 0.015003 s. (clock)
Elapsed time: 6 s. (time), 0.001109 s. (clock)
Elapsed time: 43 s. (time), 0.069825 s. (clock)

CMake time was 21.81% from all build time. Not quite 50%. As we can see that ccache reduced the CMake time by 25%.

I configured ld.gold like this:

sudo ln -sf /usr/bin/x86_64-linux-gnu-ld.gold /usr/bin/ld

Then the build time of libclang target was:

Elapsed time: 39 s. (time), 0.068965 s. (clock)

Thus having the CMake time talking 23.52% from the all build time.

Ubuntu 16.04 LTS on Windows 10

I tested the same setup on my Windows 10 in the Linux Bash Shell running Ubuntu 16.04 LTS.

Results of a normal build without ccache:

Elapsed time: 84 s. (time), 0.03125 s. (clock)
Elapsed time: 35 s. (time), 0.015625 s. (clock)
Elapsed time: 3328 s. (time), 0.1875 s. (clock)

CMake time was 2.46% from all build time. Compared to running natively cmake was 6x slower.

ccache cold:

Elapsed time: 98 s. (time), 0.140625 s. (clock)
Elapsed time: 37 s. (time), 0 s. (clock)
Elapsed time: 3845 s. (time), 0.25 s. (clock)

CMake time was 2.48% from all build time.

ccache warm:

Elapsed time: 81 s. (time), 0.0625 s. (clock)
Elapsed time: 37 s. (time), 0.015625 s. (clock)
Elapsed time: 223 s. (time), 0.25 s. (clock)

CMake time was 26.64% from all build time.

ccache warm with ld.gold

Elapsed time: 79 s. (time), 0.015625 s. (clock)
Elapsed time: 37 s. (time), 0.015625 s. (clock)
Elapsed time: 213 s. (time), 0.296875 s. (clock)

CMake time was 27.05% from all build time.

The fastest build on Linux Bash Shell was 5.72x slower than running natively.

MinGW-w64 GCC 5.4.0 on Windows 10

My next attempt was to use the same GCC version build natively for Windows. MSys2 comes with GCC, ccache, ninja. Unfortunately llvm + clang was not compilable. I didn’t try to investigate and fix the problem, instead decided to take the GCC 5.4.0 build from MinGW-w64 repo x86_64-5.4.0-release-posix-seh

My next problem was the fact that I didn’t have ccache anymore. I already knew that ccache is usable on Windows using MinGW and decided to build it.

The following picture describes my feelings after opening the ccache’s source archive:

Instead of giving up I decided write a CMake port for ccache. A few hours later I got it working, code is on github.

I was all set. Results of normal build without cache:

Elapsed time: 44 s. (time), 44.408 s. (clock)
Elapsed time: 22 s. (time), 22.126 s. (clock)
Elapsed time: 2671 s. (time), 2670.62 s. (clock)

CMake time was 1.62% from all build time, and only 3.14x slower than running on Linux.

Setting up ccache was a bit troublesome. On Linux under /usr/lib/ccache the symbolic links for g++ work wonderful. On Windows when I tried using mklink I’ve got ccache complaining about some recursion.

I had to tell CMake to use ccache by using the CMAKE_CXX_COMPILER_LAUNCHER command line parameter.

ccache cold:

Elapsed time: 44 s. (time), 43.901 s. (clock)
Elapsed time: 20 s. (time), 20.747 s. (clock)
Elapsed time: 3326 s. (time), 3325.93 s. (clock)

CMake time was 1.30% from all build time.

ccache warm:

Elapsed time: 43 s. (time), 43.284 s. (clock)
Elapsed time: 20 s. (time), 20.501 s. (clock)
Elapsed time: 99 s. (time), 99.036 s. (clock)

CMake time was 30.28% from all build time. Also all the configure checks were not speed up, I think CMAKE_CXX_COMPILER_LAUNCHER is not taken into consideration in this case.

Setting up ld.gold was done like this:

C:\mingw64\bin
$ copy ld.gold.exe ld.exe
Overwrite ld.exe? (Yes/No/All): y
        1 file(s) copied.

ccache and ld.gold:

Elapsed time: 43 s. (time), 43.502 s. (clock)
Elapsed time: 20 s. (time), 20.501 s. (clock)
Elapsed time: 99 s. (time), 99.661 s. (clock)

No difference, which makes me think that LLVM CMake code detects ld.gold if present on Windows and uses it automatically. Found out that CMakeCache.txt had the following variables: GOLD_EXECUTABLE and LLVM_TOOL_GOLD_BUILD set to ON.

Renamed ld.gold.exe to something else, copied ld.bfd.exe as ld.exe and run the build again.

Elapsed time: 44 s. (time), 44.112 s. (clock)
Elapsed time: 21 s. (time), 20.563 s. (clock)
Elapsed time: 101 s. (time), 101.145 s. (clock)

No idea why there was no more significant difference between ld.bfd.exe and ld.gold.exe.

The Windows native cached build was 2.78x slower than the Linux native build, and 2x faster than the Linux build running under Windows 10’s Linux Bash Shell.

CMake Speedup

Now I guess you are wondering about the promised CMake speedup, right?

You have noticed that the second CMake run is almost two times faster than the first one!

CMake for configure checks actually sets up a small project using the given generator (in my case ninja), it tries to compile the project, and based on the compilation result determines if some header, function or symbol is present on the system.

These checks are run sequential, not in parallel, and thus they can take some time.

At some point this year I’ve learned that one can override a CMake function / macro and the original function is accessible under the same name prefixed with an underscore. Daniel Pfeiffer mentions this in his C++Now 2017 Effective CMake talk.

My thought was to override all the checks and cache them for further use.

CMake -C command pre-loads a script to populate the cache.

So I’ve come up with some code (get it from github ) which can be used like this:

cmake_minimum_required(VERSION 3.4.3)
set(CMAKE_MODULE_PATH ${CMAKE_SOURCE_DIR}/CMakeChecksCache)
add_subdirectory(llvm-4.0.0.src)

When CMake will do an include(CheckIncludeFile) it will get my version of CheckIncludeFile.cmake which will save all findings in cmake_checks_cache.txt file, or a different file name which you can set via CMAKE_CHECKS_CACHE_FILE.

Implementation has a few hacks due to bugs into CMake *.cmake files. For example CheckSymbolExists.cmake has an implementation macro named _CHECK_SYMBOL_EXISTS! Also these macros do not have inclusion guards, which means that my override macro will always be redefined by the actual call of include(Check...).

Usage is simple:

First create the CMake checks cache file.

$ cmake -E time cmake -G "Ninja" .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/cadam/llvm -DLLVM_TARGETS_TO_BUILD=X86`

Notice that I used .. instead of ../llvm-4.0.0.src, because that’s where I put the three lines CMakeLists.txt file from above.

Then we just tell CMake to use the checks cache file

$ cmake -E time cmake -C cmake_checks_cache.txt -G "Ninja" ../llvm-4.0.0.src -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/cadam/llvm -DLLVM_TARGETS_TO_BUILD=X86

LLVM and clang together have 115 configure checks which are no cached!

The results of the runs are now like this:

Ubuntu 16.04 LTS with warm ccache, ld.gold and cmake-checks-cache:

Elapsed time: 7 s. (time), 0.001996 s. (clock)
Elapsed time: 6 s. (time), 0.001232 s. (clock)
Elapsed time: 40 s. (time), 0.067355 s. (clock)

CMake time is 14.89% from all build time. This is down from 23.52%!

Ubuntu 16.04 LTS on Windows 10 with warm ccache, ld.gold and cmake-checks-cache:

Elapsed time: 44 s. (time), 0.046875 s. (clock)
Elapsed time: 36 s. (time), 0 s. (clock)
Elapsed time: 205 s. (time), 0.1875 s. (clock)

CMake time is 17.67% from all build time. This is down from 27.05%!

MinGW-w64 GCC 5.4.0 on Windows 10 with warm ccache, ld.gold and cmake-checks-cache:

Elapsed time: 25 s. (time), 24.704 s. (clock)
Elapsed time: 21 s. (time), 20.469 s. (clock)
Elapsed time: 99 s. (time), 99.489 s. (clock)

CMake time is 20.16% from all build time. This is down from 30.28%!

You may be wondering why the second CMake run is still faster, that’s because CMake still does the initial compiler checks. I had a look at what was needed to do to cache those values, and gave up

Conclusion

If you are using a continuous integration build system (who doesn’t?), and using CMake, you might want to cache all those checks which do not change very often!

Ubuntu 16.04 LTS

Ubuntu 16.04 LTS on Windows 10

MinGW-w64 GCC 5.4.0 on Windows 10

CMake Speedup

Conclusion

Comments