Cristian Adam

Speeding up libclang on Windows

In this article I will tackle libclang’s speed on Windows, in particular Qt Creator’s clang code model.

Qt Creator 3.6.0 fixed the following bug: QTCREATORBUG-15365: Clang Model: code completion speed regression. The bug report contains information on how to enable Qt Creator’s clang code model statistics. This is done by setting this environment variable: QT_LOGGING_RULES=qtc.clangbackend.timers=true.

On Windows Qt Creator will output this information in Windows debugger output. I use DebugView to view this information.

libclang is used by Qt Creator to provide code completion support. The clang code model is still experimental and not 100% feature equivalent with the Qt Creator built-in code model.

By using the clang code model it means that Qt Creator uses a real C++ compiler to parse the source code you are editing. It also means that if you are having a big source file, with lots of includes, it will take some time to do so.

Qt Creator will cache this information in a form of a pch file under %temp%/qtc-clang-[some letters]/preamble-[some numbers].pch file. The complete compilation is done only once. The subsequent code completion commands are fast.

I have picked Lyx – The Document Processor as a test project for Qt Creator. Lyx uses Boost and Qt5 and on my Intel(R) Core (TM) i7 CPU M 620 @ 2.67 GHz Windows 10 powered laptop it takes, for Text3.cpp, approximately 10 seconds to “compile”.

Even though my laptop has multiple cores, libclang will use only one core to compile Text3.cpp. What can we do about it? It would be nice if libclang could use the GPU smile

Qt Creator 3.6.0 ships with libclang 3.6.2, and for Windows it ships a Visual C++ 2013 32 bit build, unlike Linux where 64 bit is the norm.

I will take clang 3.6.2 and compile it Visual C++ 2013, Visual C++ 2015, Clang 3.7.0 and Mingw-w64 GCC 5.3.0. I have managed to get libclang to compile Text3.cpp in approximatively 6 seconds. Which C++ compiler was able to this?

Setup

I have used the git version of Lyx with both Qt 5.5.1 for Windows 32-bit (VS 2013, 804 MB) and Qt 5.5.1 for Windows 32-bit (MinGW 4.9.2, 1.0 GB). Further on I will name these two as Visual C++ kit and MinGW kit.

The CMake configuration line for Visual C++ 2013 was:

-DLYX_DEPENDENCIES_DOWNLOAD=1 -DLYX_USE_QT=QT5 -DCMAKE_PREFIX_PATH=c:\Qt\Qt5.5.1\5.5\msvc2013\lib\cmake\

The CMake configuration line for MinGW 4.9.2 was:

-DLYX_DEPENDENCIES_DOWNLOAD=1 -DLYX_USE_QT=QT5 -DCMAKE_PREFIX_PATH=c:\Qt\Qt5.5.1-gcc\5.5\mingw492_32\lib\cmake\

The test was to open Text3.cpp, navigate to the end and wait for qtc.clangbackend.timers: ClangIpcServer::registerTranslationUnitsForEditor to show up in DebugView. Then close the document and open it again. I have done this 10 times, to have a better mean (average) value.

To find out how many header Text3.cpp was including I went to Qt Creator’s menu: “Tools -> C++ -> Inspect C++ Code Model… (Ctrl+Shift+F12)” and found out that for Visual C++ it was including 776 documents, and for MinGW 4.9.2 828 documents!

I will compile libclang.dll with various C++ compilers and see how it works with both Visual C++ 2013 kit and MinGW 4.9.2 kit in Qt Creator.

Visual C++ 2013 32 bit

Qt Creator shipps with libclang.dll compiled with Visual C++ 2013 32 bit. The mean value for registerTranslationUnitsForEditor was 9533.13. Let’s say it’s almost 10 seconds smile

By switching to MinGW 4.9.2 the mean value for registerTranslationUnitsForEditor was 8248.3 ms. By simply switching to MinGW I gained a 13.4% speed increase.

We got this speed up because the MinGW include headers are IMO easier to parse / simpler than the Visual C++ ones.

When going to the “Inspect C++ Code Model…” dialog Qt Creator will generate a %temp%/qtc-codemodelinspection_[some numbers].txt file. For Visual C++ 2013 this file was 13.2 MB in size, while for MinGW 4.9.2 it was 10.2 MB in size.

The preamble_[some numbers].pch file (generated by libclang) was bigger for MinGW 4.9.2 – 26.5 MB in size, while for Visual C++ 2013 it was 24.7 MB in size.

Compiling Qt Creator

It is known that 64 bit performs faster than 32 bit, right? Therefore let’s compile libclang and Qt Creator for 64 bit.

Compiling Qt Creator for 64 bit requires Qt 5.5.1 for Windows 64-bit (VS 2013, 823 MB) to be installed before (I have installed it under C:\Qt\Qt5.5.1-x64).

Download qt-creator-opensource-src-3.6.0.zip and unpack it somewhere. Then run the following commands from the Visual C++ 2013 64bit Tools Command Prompt:

$ mkdir qt-creator-build
$ cd qt-creator-build
$ set LLVM_INSTALL_DIR=c:\llvm
$ set PATH=C:\Qt\Qt5.5.1-x64\5.5\msvc2013_64\bin;%PATH%
$ qmake ..\qt-creator-opensource-src-3.6.0\qtcreator.pro CONFIG+=release -r -spec win32-msvc2013
$ set PATH=c:\Qt\Qt5.5.1-x64\Tools\QtCreator\bin\;%PATH%
$ cmake -E time jom

Note the set LLVM_INSTALL_DIR=c:\llvm command, which means that you have to compile and install clang to c:\llvm fist. Before compiling Qt Creator please compile clang (the next paragraph) and instead of cmake -E time ninja libclang do a full cmake -E time ninja build.

A full clang build with Visual C++ 2013 64 bit took on my machine 39m:43s. Qt Creator 64 bit was build in 22m:51s.

To run my Qt Creator build, I have created a batch file (run.cmd) containing:

@echo off
set PATH=C:\Qt\Qt5.5.1-x64\5.5\msvc2013_64\bin;%PATH%
set PATH=c:\Qt\Qt5.5.1-x64\Tools\QtCreator\bin\;%PATH%
set QT_LOGGING_RULES=qtc.clangbackend.timers=true
qtcreator

Compiling libclang

Download llvm-3.6.2.src.tar.xz and cfe-3.6.2.src.tar.xz (clang) and unpack them somewhere. I have used a Cygwin box for the following commands:

$ tar xJf llvm-3.6.2.src.tar.xz
$ tar xJf cfe-3.6.2.src.tar.xz
$ mv cfe-3.6.2.src llvm-3.6.2.src/tools/clang

One could do without Cygwin by using e.g. 7-zip, but I find Cywgin more convenient.

To configure and compile clang one only needs to issue the following commands (under Visual C++ Tools Command Prompt)

$ mkdir llvm-3.6.2-build
$ cd llvm-3.6.2-build
$ cmake -G "Ninja" ..\llvm-3.6.2.src\ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=c:\llvm -DLLVM_TARGETS_TO_BUILD=X86
$ cmake -E time ninja libclang

cmake -E time is very practical on Windows to time various operations since the Windows command prompt lacks the equivalent of time from Unix/Linux.

libclang.dll will be placed under llvm-3.6.2-build/bin directory.

Since libclang.dll provides a C API interface we can simply swap it without having to recompile Qt Creator.

Visual C++ 2013 64 bit

I have opened up a Visual C++ 2013 64 bit Tools Command Prompt and issued the two cmake commands in a specific build directory. The build took 24m:26s. The resulted libclang was 10.1 MB.

The mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 9371.5 ms, and for the MinGW kit was 8434.6 ms.

Compared with Visual C++ 2013 32 bit the value for Visual C++ was better while the value for MinGW was worse.

Visual C++ 2015 32 bit

Visual C++ 2015 has implemented some C++17 features and the source code for clang 3.6.2 needs to be patched (info taken from r237863):

diff -Naur llvm-3.6.2.src/tools/clang/lib/Serialization/ASTWriter.cpp llvm-3.6.2.src-vs2015/tools/clang/lib/Serialization/ASTWriter.cpp
--- llvm-3.6.2.src/tools/clang/lib/Serialization/ASTWriter.cpp    2014-12-27 23:14:15.000000000 +0100
+++ llvm-3.6.2.src-vs2015/tools/clang/lib/Serialization/ASTWriter.cpp    2016-01-03 18:29:02.395326500 +0100
@@ -60,14 +60,14 @@
 using namespace clang::serialization;
 
 template <typename T, typename Allocator>
-static StringRef data(const std::vector<T, Allocator> &v) {
+static StringRef bytes(const std::vector<T, Allocator> &v) {
   if (v.empty()) return StringRef();
   return StringRef(reinterpret_cast<const char*>(&v[0]),
                          sizeof(T) * v.size());
 }
 
 template <typename T>
-static StringRef data(const SmallVectorImpl<T> &v) {
+static StringRef bytes(const SmallVectorImpl<T> &v) {
   return StringRef(reinterpret_cast<const char*>(v.data()),
                          sizeof(T) * v.size());
 }
@@ -1514,7 +1514,7 @@
   Record.push_back(INPUT_FILE_OFFSETS);
   Record.push_back(InputFileOffsets.size());
   Record.push_back(UserFilesNum);
-  Stream.EmitRecordWithBlob(OffsetsAbbrevCode, Record, data(InputFileOffsets));
+  Stream.EmitRecordWithBlob(OffsetsAbbrevCode, Record, bytes(InputFileOffsets));
 }
 
 //===----------------------------------------------------------------------===//
@@ -1909,7 +1909,7 @@
   Record.push_back(SOURCE_LOCATION_OFFSETS);
   Record.push_back(SLocEntryOffsets.size());
   Record.push_back(SourceMgr.getNextLocalOffset() - 1); // skip dummy
-  Stream.EmitRecordWithBlob(SLocOffsetsAbbrev, Record, data(SLocEntryOffsets));
+  Stream.EmitRecordWithBlob(SLocOffsetsAbbrev, Record, bytes(SLocEntryOffsets));
 
   // Write the source location entry preloads array, telling the AST
   // reader which source locations entries it should load eagerly.
@@ -2234,7 +2234,7 @@
   Record.push_back(MacroOffsets.size());
   Record.push_back(FirstMacroID - NUM_PREDEF_MACRO_IDS);
   Stream.EmitRecordWithBlob(MacroOffsetAbbrev, Record,
-                            data(MacroOffsets));
+                            bytes(MacroOffsets));
 }
 
 void ASTWriter::WritePreprocessorDetail(PreprocessingRecord &PPRec) {
@@ -2332,7 +2332,7 @@
     Record.push_back(PPD_ENTITIES_OFFSETS);
     Record.push_back(FirstPreprocessorEntityID - NUM_PREDEF_PP_ENTITY_IDS);
     Stream.EmitRecordWithBlob(PPEOffsetAbbrev, Record,
-                              data(PreprocessedEntityOffsets));
+                              bytes(PreprocessedEntityOffsets));
   }
 }
 
@@ -2704,7 +2704,7 @@
   Record.push_back(CXX_BASE_SPECIFIER_OFFSETS);
   Record.push_back(CXXBaseSpecifiersOffsets.size());
   Stream.EmitRecordWithBlob(BaseSpecifierOffsetAbbrev, Record,
-                            data(CXXBaseSpecifiersOffsets));
+                            bytes(CXXBaseSpecifiersOffsets));
 }
 
 //===----------------------------------------------------------------------===//
@@ -2780,7 +2780,7 @@
     Decls.push_back(std::make_pair(D->getKind(), GetDeclRef(D)));
 
   ++NumLexicalDeclContexts;
-  Stream.EmitRecordWithBlob(DeclContextLexicalAbbrev, Record, data(Decls));
+  Stream.EmitRecordWithBlob(DeclContextLexicalAbbrev, Record, bytes(Decls));
   return Offset;
 }
 
@@ -2799,7 +2799,7 @@
   Record.push_back(TYPE_OFFSET);
   Record.push_back(TypeOffsets.size());
   Record.push_back(FirstTypeID - NUM_PREDEF_TYPE_IDS);
-  Stream.EmitRecordWithBlob(TypeOffsetAbbrev, Record, data(TypeOffsets));
+  Stream.EmitRecordWithBlob(TypeOffsetAbbrev, Record, bytes(TypeOffsets));
 
   // Write the declaration offsets array
   Abbrev = new BitCodeAbbrev();
@@ -2812,7 +2812,7 @@
   Record.push_back(DECL_OFFSET);
   Record.push_back(DeclOffsets.size());
   Record.push_back(FirstDeclID - NUM_PREDEF_DECL_IDS);
-  Stream.EmitRecordWithBlob(DeclOffsetAbbrev, Record, data(DeclOffsets));
+  Stream.EmitRecordWithBlob(DeclOffsetAbbrev, Record, bytes(DeclOffsets));
 }
 
 void ASTWriter::WriteFileDeclIDsMap() {
@@ -2837,7 +2837,7 @@
   unsigned AbbrevCode = Stream.EmitAbbrev(Abbrev);
   Record.push_back(FILE_SORTED_DECLS);
   Record.push_back(FileSortedIDs.size());
-  Stream.EmitRecordWithBlob(AbbrevCode, Record, data(FileSortedIDs));
+  Stream.EmitRecordWithBlob(AbbrevCode, Record, bytes(FileSortedIDs));
 }
 
 void ASTWriter::WriteComments() {
@@ -3067,7 +3067,7 @@
     Record.push_back(SelectorOffsets.size());
     Record.push_back(FirstSelectorID - NUM_PREDEF_SELECTOR_IDS);
     Stream.EmitRecordWithBlob(SelectorOffsetAbbrev, Record,
-                              data(SelectorOffsets));
+                              bytes(SelectorOffsets));
   }
 }
 
@@ -3517,7 +3517,7 @@
   Record.push_back(IdentifierOffsets.size());
   Record.push_back(FirstIdentID - NUM_PREDEF_IDENT_IDS);
   Stream.EmitRecordWithBlob(IdentifierOffsetAbbrev, Record,
-                            data(IdentifierOffsets));
+                            bytes(IdentifierOffsets));
 }
 
 //===----------------------------------------------------------------------===//
@@ -4443,7 +4443,7 @@
   Record.clear();
   Record.push_back(TU_UPDATE_LEXICAL);
   Stream.EmitRecordWithBlob(TuUpdateLexicalAbbrev, Record,
-                            data(NewGlobalDecls));
+                            bytes(NewGlobalDecls));
   
   // And a visible updates block for the translation unit.
   Abv = new llvm::BitCodeAbbrev();

After having the above patch in, I was able to compile libclang with Visual C++ 2015 32 bit libclang.dll in 16m:27s. Quite snappy. libclang.dll was 7.60 MB in size. Quite small smile

The mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 9541.9 ms, and for the MinGW kit was 8238.3 ms.

The values are almost identical to the Visual C++ 2013 32 bit ones.

Visual C++ 2015 64 bit

Next I’ve compiled the Visual C++ 2015 64 bit libclang.dll version. It took 19m:10s. That is almost 3 minutes slower than the 32 bit. The binary size of libclang.dll was 10.2 MB.

The mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 9213.1 ms, and for the MinGW kit was 8266.4 ms.

Visual C++ 2015 64 bit produced faster results than Visual C++ 2013 64 bit! Yey progress!

Clang 3.7.0 32 bit

The next step was to compile libclang with Clang itself. I took Clang for Windows (32-bit) and installed under C:\Program Files (x86)\LLVM.

Clang on Windows comes with a Visual C++ cl.exe compatible driver, some headers and some support for MS Build. It doesn’t come with a C++ standard library, it completely relies on Visual C++ to provide those.

Since I am using ninja to build liblang I had to issue the following commands from a Visual C++ 2013 32 bit Tools Command Prompt:

$ set PATH=C:\Program Files (x86)\LLVM\msbuild-bin\;%PATH%
$ set INCLUDE=C:\Program Files (x86)\LLVM\lib\clang\3.7.0\include\;%INCLUDE%

But before issuing the usual CMake commands, libclang CMake machinery needs to be patched:

diff -Naur llvm-3.6.2.src/cmake/modules/HandleLLVMOptions.cmake llvm-3.6.2.src-clang/cmake/modules/HandleLLVMOptions.cmake
--- llvm-3.6.2.src/cmake/modules/HandleLLVMOptions.cmake    2014-12-02 19:59:08.000000000 +0100
+++ llvm-3.6.2.src-clang/cmake/modules/HandleLLVMOptions.cmake    2016-01-03 00:40:20.014951500 +0100
@@ -29,14 +29,14 @@
       set(OLD_CMAKE_REQUIRED_FLAGS ${CMAKE_REQUIRED_FLAGS})
       set(OLD_CMAKE_REQUIRED_LIBRARIES ${CMAKE_REQUIRED_LIBRARIES})
       set(CMAKE_REQUIRED_FLAGS "-std=c++0x")
-      check_cxx_source_compiles("
-#include <atomic>
-std::atomic<float> x(0.0f);
-int main() { return (float)x; }"
-        LLVM_NO_OLD_LIBSTDCXX)
-      if(NOT LLVM_NO_OLD_LIBSTDCXX)
-        message(FATAL_ERROR "Host Clang must be able to find libstdc++4.7 or newer!")
-      endif()
+#      check_cxx_source_compiles("
+##include <atomic>
+#std::atomic<float> x(0.0f);
+#int main() { return (float)x; }"
+#        LLVM_NO_OLD_LIBSTDCXX)
+#      if(NOT LLVM_NO_OLD_LIBSTDCXX)
+#        message(FATAL_ERROR "Host Clang must be able to find libstdc++4.7 or newer!")
+#      endif()
       set(CMAKE_REQUIRED_FLAGS ${OLD_CMAKE_REQUIRED_FLAGS})
       set(CMAKE_REQUIRED_LIBRARIES ${OLD_CMAKE_REQUIRED_LIBRARIES})
     endif()

The libclang.dll was built in 37m:29s and it was 14.8 MB in size.

Clang 3.7.0 32 bit is more than two times slower than Visual C++ 2015 32 bit and the binaries produced are almost double the size! Let’s see how it performs!

The mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 9286.1 ms, and for the MinGW kit was 7692.4 ms.

The clang 3.7.0 32 bit binary was faster than the Visual C++ 2015 32 bit binary!

Clang 3.7.0 64 bit

To compile for 64 bit I took Clang for Windows (64-bit) and installed it under C:\Program Files\LLVM.

The installer will complain that it was already installed, but that is not true, the 32 bit version was installed not the 64 bit one.

The commands which needed to override Visual C++ 2015 64 bit compiler needed to be adjusted as well:

$ set PATH=c:\Program Files\LLVM\msbuild-bin\;%PATH%
$ set INCLUDE=c:\Program Files\LLVM\lib\clang\3.7.0\include\;%INCLUDE%

The libclang.dll was built in 39m:12s and it was 15.3 MB in size.

Clang 3.7.0 64 bit behaves the same as Visual C++ 2015 64 bit, the compile time is longer and the binaries are a tad bigger.

Clang 3.7.0 64 bit is two times slower than Visual C++ 2015 64 bit and the binary produced is 1.5x bigger. But is it fast?

The mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 8820.6 ms, and for the MinGW kit was 7581.5 ms.

The answer is YES! And, the clang 3.7.0 64 bit binary is the fastest binary yet!

Mingw-w64 GCC 5.3.0 32 bit

Download and install the Mingw-w64 GCC 5.3.0 32 bit thread posix, dwarf.

I have created a mingw-vars.cmd helper batch file, which I put in the mingw32 directory:

@echo off
set PATH=%~dp0bin;%PATH%
gcc --version

Compiling with CMake without any patches took 21m:36s. The stripped libclang.dll was 16.9 MB in size.

While the compilation time was pretty good, the binary size was not. But how does it perform?

The mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 8314.9 ms, and for the MinGW kit was 7335.9 ms.

It’s faster than Clang 3.7.0 64 bit! We have a new winner. tada

Mingw-w64 GCC 5.3.0 64 bit

Download and install the Mingw-w64 GCC 5.3.0 64 bit thread posix, seh.

Compiling with CMake without any patches took 23m:16s. The stripped libclang.dll was 15.6 MB in size.

The 64 bit compilation was slower than the 32 bit, like for the other compilers, but the 64 bit binary size was smaller!

The mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 10509.3 ms, and for the MinGW kit was 7637.5 ms.

The 64 bit binary was slower than the 32 bit binary. For the Visual C++ kit it was the slowest of them all anguished

I double checked the MinGW 5.3.0 64 bit performance with another distro – Nuwen. There was some improvement, but same behavior: worse than 32 bit and the Visual C++ kit was slow.

The mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 9939.4 ms, and for the MinGW kit was 7410.2 ms.

Profile-guided optimization (PGO)

Next I’m going to build libclang optimized to compile Text3.cpp. I will use Profile Guided Optimization for this.

To do a PGO build one needs to:

  • set some special flags for compiler and linker to do an instrumented build
  • train the build with the use cases – in my case open Text3.cpp
  • set other special flags for compiler and linker and do the final PGO build

I will do a Visual C++ 2015 64 PGO build and MinGW 5.3.0 32 and 64 bit. I left out Clang 3.7.0 because the “cl” driver doesn’t support the PGO flags.

Visual C++ 2015 64 bit PGO

To enable PGO one needs to edit llvm-3.6.2.src\CMakeLists.txt and add the following lines:

if( MSVC )
  set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /GL")
  set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} /LTCG:PGINSTRUMENT")
endif()

Then do the regular CMake build. The 64 PGO build took 16m:32s. That is less than the regular build. I suspect the /GL flag which means enable link-time code generation, thus moving some computational time from compilation time to linking time. The binary size grew to 25.3MB and nearby was a 84.7 MB libclang.pgd file.

That was the first part.

Then I decided to do training separate for each kit Visual C++ and MinGW.

The Visual C++ registerTranslationUnitsForEditor reported a whopping 226615 ms, that is just 24.5 times slower smile

The MinGW registerTranslationUnitsForEditor reported 148566 ms, that is just 17.9 times slower.

This is another indication that Visual C++ system headers require more computation power than MinGW’s.

The training step did produce two files (because I have opened Text3.cpp twice): libclang!1.pgc and libclang!2.pgc. For Visual C++ kit they were 12.0MB in size, for MinGW kit they were 12.8MB in size. It recorded more information for MinGW in less time. Curious.

The final step is to copy the pgc files in the build directories close to libclang.pgd and perform the final optimization.

Unfortunately my CMake-fu is poor and when I have swapped /LTCG:PGINSTRUMENT for /LTCG:PGOPTIMIZE in CMakeLists.txt CMake didn’t to the expected thing, so I had to delete libclang.dll and manually edit build.ninja and replace the values.

After that cmake -E time ninja libclang took for Visual C++ 6m:39s and for MinGW 7m:14s.

Visual C++ prints some nice infos when it does the PGO linking.

Here’s the Visual C++ version:

$ cmake -E time ninja libclang
[1/1] Linking CXX shared library bin\libclang.dll
Merging bin\libclang!1.pgc
bin\libclang!1.pgc: Used  5.0% (12657888 / 255700992) of total space reserved.  0.0% of the counts were dropped due to overflow.
Merging bin\libclang!2.pgc
bin\libclang!2.pgc: Used  5.0% (12667264 / 255700992) of total space reserved.  0.0% of the counts were dropped due to overflow.
  Reading PGD file 1: bin\libclang.pgd
   Creating library lib\libclang.lib and object lib\libclang.exp
Generating code

0 of 0 ( 0.0%) original invalid call sites were matched.
0 new call sites were added.
64 of 190350 (  0.03%) profiled functions will be compiled for speed, and the rest of the functions will be compiled for size
1123298 of 2227108 inline instances were from dead/cold paths
190341 of 190350 functions (100.0%) were optimized using profile data, and the rest of the functions were optimized without using profile data
276441555770 of 276441555770 instructions (100.0%) were optimized using profile data
Finished generating code

And the MinGW version:

$ cmake -E time ninja libclang
[1/1] Linking CXX shared library bin\libclang.dll
Merging bin\libclang!1.pgc
bin\libclang!1.pgc: Used  5.3% (13502496 / 255700992) of total space reserved.  0.0% of the counts were dropped due to overflow.
Merging bin\libclang!2.pgc
bin\libclang!2.pgc: Used  5.3% (13574856 / 255700992) of total space reserved.  0.0% of the counts were dropped due to overflow.
  Reading PGD file 1: bin\libclang.pgd
   Creating library lib\libclang.lib and object lib\libclang.exp
Generating code

0 of 0 ( 0.0%) original invalid call sites were matched.
0 new call sites were added.
223 of 190350 (  0.12%) profiled functions will be compiled for speed, and the rest of the functions will be compiled for size
1120831 of 2256102 inline instances were from dead/cold paths
190341 of 190350 functions (100.0%) were optimized using profile data, and the rest of the functions were optimized without using profile data
99269434860 of 99269434860 instructions (100.0%) were optimized using profile data
Finished generating code

The huge number of instructions at the end seem erroneous, most likely a bug smile

The PGO optimized libclang.dll was for Visual C++ 7.84 MB in size, and for MinGW 8.01 MB in size.

The Visual C++ PGO mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 8039.2 ms, and for the MinGW kit was 6705.2 ms.

The MinGW PGO mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 7913.7 ms, and for the MinGW kit was 6289.6 ms.

It seems the MinGW training data was beneficial also for Visual C++ kit. 14% speed increase for Visual C++ and 24% for MinGW.

One last thing to mention is the size of the whole libclang build. Normal build directory was 650MB in size, but the PGO build directory was 9GB!!!

Right now libclang build with Visual C++ 2015 64 bit and PGO optimized is the fastest binary. The approximately 6 seconds target was reached!

Mingw-w64 GCC 5.3.0 32 bit PGO

MinGW also requires editing of llvm-3.6.2.src\CMakeLists.txt to enable PGO:

if( MINGW )
  set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -fprofile-generate")
  set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -fprofile-generate")
endif()

Then do a regular CMake build. The build took 26m:24s, a bit more than the normal build. The stripped libclang.dll was 41.8 MB in size.

GCC’s PGO is different than Visual C++’s. There are no pgd like files generated. During the training there are gcda files generated directly nearby to the build obj files. You can change the directory where the files are generated with a compiler switch, but this just fine.

I have done also separate Visual C++ and MinGW trainings.

The Visual C++ registerTranslationUnitsForEditor reported a 27789 ms, that is just 3.3 times slower.

The MinGW registerTranslationUnitsForEditor reported 18388 ms, that is just 2.5 times slower.

That is way better than the Visual C++ PGO penalty!

For the final step I have hacked again build.ninja and replaced -fprofile-generate with -fprofile-use. The build times were 21m:12s for Visual C++ and 20m:56s for MinGW case.

Unfortunately MinGW GCC doesn’t produce any PGO statistical information.

The PGO optimized libclang.dll was for Visual C++ 14.3 MB in size, and for MinGW 14.5 MB in

The Visual C++ PGO mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 6980.5 ms, and for the MinGW kit was 6276.2 ms.

The MinGW PGO mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 7420.5 ms, and for the MinGW kit was 6141.8 ms.

For MinGW 5.3.0 32 bit the instrumented cases produced the fastest times. 16% speed increase for Visual C++ and 16.2% for MinGW.

Mingw-w64 GCC 5.3.0 64 bit PGO

The 64 bit MinGW PGO procedure is the same as for 32 bit. Instrumented build took 30m:10s, binary size was 36.4 MB.

The Visual C++ registerTranslationUnitsForEditor reported a 27751 ms, that is just 2.6 times slower.

The MinGW registerTranslationUnitsForEditor reported 16766 ms, that is just 2.2 times slower.

The optimized build took 23m:41s for Visual C++ and 26m:48s for MinGW. For MinGW I had to restart the procedure because the first time the optimized build failed, some bad instrumentation.

The PGO optimized libclang.dll was for Visual C++ 13.2 MB in size, and for MinGW 13.4 MB in

The Visual C++ PGO mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 8620.9 ms, and for the MinGW kit was 6516.8 ms.

The MinGW PGO mean value for registerTranslationUnitsForEditor for the Visual C++ kit was 9567.5 ms, and for the MinGW kit was 6545.9 ms.

For MinGW 5.3.0 64 bit the instrumented cases were 18% speed increase for Visual C++ and 14.2% for MinGW.

The 32 bit MinGW 5.3.0 version produced faster binaries than the 64 bit version.

Summary

I’ve gathered all the numbers in one table, for easier comparison:

Compiler Time to compile Binary size Visual C++ kit MinGW kit
Visual C++ 2013 32 - 7.65 MB 9533.1 ms 8248.3 ms
Visual C++ 2013 64 24m:26s 10.1 MB 9371.5 ms 8434.6 ms
Visual C++ 2015 32 16m:27s 7.60 MB 9541.9 ms 8238.3 ms
Visual C++ 2015 64 19m:10s 10.2 MB 9213.1 ms 8266.4 ms
Clang 3.7.0 32 37m:29s 14.8 MB 9286.1 ms 7692.4 ms
Clang 3.7.0 64 39m:12s 15.3 MB 8820.6 ms 7581.5 ms
MinGW 5.3.0 32 21m:36s 16.9 MB 8314.9 ms 7335.9 ms
MinGW 5.3.0 64 23m:16s 15.6 MB 10509.3 ms 7637.5 ms
MinGW 5.3.0 Nuwen 24m:31s 16.7 MB 9939.4 ms 7410.2 ms
Visual C++ 2015 64
Visual C++ PGO
25m:11s+ 7.84 MB 8039.2 ms 6705.2 ms
Visual C++ 2015 64
MinGW PGO
25m:46s+ 8.01 MB 7913.7 ms 6289.6 ms
MinGW 5.3.0 32
Visual C++ PGO
47m:36s+ 14.3 MB 7420.5 ms 6141.8 ms
MinGW 5.3.0 32
MinGW PGO
47m:20s+ 14.5 MB 6980.5 ms 6276.2 ms
MinGW 5.3.0 64
Visual C++ PGO
53m:51s+ 13.2 MB 8620.9 ms 6516.8 ms
MinGW 5.3.0 64
MinGW PGO
56m:58s+ 13.4 MB 9567.5 ms 6545.9 ms

MinGW 5.3.0 32 bit is the winner in normal and PGO mode.

In normal mode Visual C++ kit is 12.7% faster, MinGW kit is 11.0% faster than the provided Visual C++ 2013 32 bit libclang.dll.

The PGO libclang.dll is for Visual C++ kit 26.7% faster, MinGW kit is 25.5% faster than the libclang.dll that comes with Qt Creator 3.6.0.

By choosing the MinGW kit instead of the Visual C++ kit one benefits of 23% speed increase in normal mode, respectively 12.0% speed increase in PGO mode.

So next time code completion is slow in Qt Creator, do something about it! sunglasses

Downloads

I have 7zipped all the libclang.dll versions in an archive.

To use the 64 bit versions I have also 7zipped my Visual C++ 2013 64 bit build of Qt Creator 3.6.0.

The above links are self-extracting 7zip archives.

Which libclang.dll performed better on your project? Comment below. Thanks!

Comments