Introduction
There is a common stereotype, that C++ exception handling code will significantly increase binary image size ("up to 100kb"). In this article, I want to check these claims and find out, how much image size are C++ exception handling code using alone?
To obtain verifiable results, I will use following test method:
Test program uses newlib as standard library and defines "standard" do-nothing syscall stubs.The only meaningful syscall here is
Also, ARM architecture has an option to use either wider (4 bytes) and more complex ARM mode instructions or smaller (2 bytes) and simpler Thumb mode instructions. I'll investigate both options in this article.
To start simulation, use this command:
And benchmark results are:
WOW. That's really crazy results, enabling C++ exceptions may increase image size by 110kb!
Well, there are obscure environment variables
Obviously, toolchain needs to be rebuilt with reasonable optimization flags. Build results are following:
Results are twice as better now. Also notice, that even code without exceptions benefited from
However, 64 kb is still a big chunk of code, is it possible to reduce it even more? Lets review
Correct C++ way to change terminate handler is to use
Much, much better!
Nevertheless, lets review symbol file again. C++ exception handling code has also pulled in heap management functions, symbols like
Exception allocation and deallocation is controlled by two functions
Also,
So, the results are:
Perfect! That's 10x reduction from first test.
First of all, for some reason, function
Secondly, there is absolutely no need for support for
Lastly, I want to remove support for dynamic exception specification and
Also,
- Build toolchain (
binutils
,gcc
,newlib
), because third-party toolchains may interfere with results; - Create and compile small program, that use exceptions;
- Execute it on simulator to verify that program actually works.
Test program
Code repo is here. There are two versions of program for comparison: one that use exceptions and one that doesn't.
Setting up exception handling
It's rather simple to setup exception handling on ARM. Exception ABI has two defined tables: index table in
.ARM.exidx
and unwind table .ARM.extab
. Linker script should simply combine multiple pieces of .ARM.exidx*
together and define two symbols: __exidx_start
and __exidx_end
around the index. Bare metal environment
Test program runs in bare metal environment begining from reset interrupt vector. Therefore, it must setup the stack (linker script definessstack
symbol as initial stack value) and zero out .bss
area (area between szero
and ezero
symbols).
Test program uses newlib as standard library and defines "standard" do-nothing syscall stubs.The only meaningful syscall here is
_sbrk
used to obtain more heap memory. Heap space begins from sheap
symbol.Also, ARM architecture has an option to use either wider (4 bytes) and more complex ARM mode instructions or smaller (2 bytes) and simpler Thumb mode instructions. I'll investigate both options in this article.
Simulator
The easiest way to verify program operation (that I have found) is to useQEMU
for versatilepb
target. It has simple enough UART hardware API to print few strings onto terminal.To start simulation, use this command:
qemu-system-arm -M versatilepb -m 128M -nographic -kernel image file
To end simulation, press CTRL+a, c
to open qemu monitor and type quit
to terminate simulation.Build toolchain
Straightforward and naive
Ok, lets build toolchain. This script downloads tarballs and builds toolchain into$(pwd)/toolchain
directory. It's simple and straightforward. Only thing that should be noted is that gcc
requires gmp
, mpfr
and mpc
libraries either preinstalled or unpacked into gcc
code tree. In latter case, gcc
build machinery will take care of libraries compilation, so I've chosen this way.
And benchmark results are:
ARM mode | Thumb mode | |
---|---|---|
No exceptions | 429 | 321 |
With exceptions | 111408 | 84232 |
Difference | 110979 | 83911 |
Optimizations?
Lets rethink this again: by enabling exceptions, image size increases. Many bytes of code and data added. So, where are that code is from? It's fromgcc's libsupc++
support library. But, when did it build and, more importantly, what's the CFLAGS
was used for it?Well, there are obscure environment variables
CFLAGS_FOR_TARGET / CXXFLAGS_FOR_TARGET
that is used to provide optimizations and other cflag-related stuff during libsupc++
compilation as part of gcc
build process.Obviously, toolchain needs to be rebuilt with reasonable optimization flags. Build results are following:
ARM mode | Thumb mode | |
---|---|---|
No exceptions | 209 | 189 |
With exceptions | 64556 | 45484 |
Difference | 64347 | 45295 |
CFLAGS_FOR_TARGET
.However, 64 kb is still a big chunk of code, is it possible to reduce it even more? Lets review
exceptions.syms
file, it contains all symbols of the resulting binary. Here, it's easy to notice, that *printf
stuff is pulled into binary. Function dependency analysis shows the root of it: std::terminate
handler is set to a "verbose terminate handler".Correct C++ way to change terminate handler is to use
std::set_terminate
. Yet, this would not work for bare metal targets, because initial terminate handler would still be pulled in. The only correct method to replace terminate handler is to overwrite function pointer during build time. Specific variable to look for is called __cxxabiv1::__terminate_handler
ARM mode | Thumb mode | |
---|---|---|
No exceptions | 209 | 189 |
With exceptions | 64556 | 45484 |
With exceptions (simple terminate) | 17152 | 12856 |
Difference | 16943 | 12667 |
Nevertheless, lets review symbol file again. C++ exception handling code has also pulled in heap management functions, symbols like
malloc
and free
are present in binary. But to measure impact of C++ exceptions on binary size alone, it is needed to cut away heap allocation for C++ exceptions (do not try this at home!).Exception allocation and deallocation is controlled by two functions
__cxa_allocate_exception
and __cxa_free_exception
. Furthermore, __cxa_allocate_exception
has some weird semantics, it must allocate more than requested via function parameter and must return pointer to the middle of an allocated block.Also,
operator delete
has to be redefined because of function dependencies. There is no heap anyway, so it isn't a problem. Simulation shows, that program is still operational.So, the results are:
ARM mode | Thumb mode | |
---|---|---|
No exceptions | 209 | 189 |
With exceptions | 64556 | 45484 |
With exceptions (simple terminate) | 17152 | 12856 |
With exceptions (simple terminate and no heap) | 12032 | 8816 |
Difference | 11823 | 8627 |
Bugs, cheats and hacks
In this section, I would try unorthodox means to improve results. They may or may not work in your environment. I will call the results of previous section as "baseline".First of all, for some reason, function
__cxxabiv1::__is_gxx_exception_class(char*)
is duplicated three times in image file. This (inline) function simply checks value of 8 consecutive bytes. So, I made a patch that replaces the function (and similar one) with memcmp
:
ARM mode | Thumb mode | |
---|---|---|
No exceptions | 209 | 189 |
With exceptions (baseline) | 12032 | 8816 |
With exceptions (is_gxx_exception patch) | 10588 | 7828 |
Difference | 10379 | 7639 |
Secondly, there is absolutely no need for support for
VFP
and iWMMXt
registers in unwind code on softfloat target. Cutting away dead code is really beneficial:ARM mode | Thumb mode | |
---|---|---|
No exceptions | 209 | 189 |
With exceptions (baseline) | 12032 | 8816 |
With exceptions (is_gxx_exception patch) | 10588 | 7828 |
With exceptions (is_gxx_exception patch, no VFP/iWMMX) | 9408 | 6864 |
Difference | 9199 | 6675 |
Lastly, I want to remove support for dynamic exception specification and
std::unexpected
, because (a) it's deprecated, (b) it worsen runtime performance and (c) it occupies memory. Yet it is part of published standard and therefore its' removal may break things.Also,
gcc
has compiler argument -fno-enforce-eh-specs
to use in such scenario.
ARM mode | Thumb mode | |
---|---|---|
No exceptions | 209 | 189 |
With exceptions (baseline) | 12032 | 8816 |
With exceptions (is_gxx_exception patch) | 10588 | 7828 |
With exceptions (is_gxx_exception patch, no VFP/iWMMX) | 9408 | 6864 |
With exceptions (is_gxx_exception, no VFP/iWMMX, no unexpected) | 9308 | 6760 |
Difference | 9099 | 6571 |
Conclusions
- It is confirmed, that use of C++ exceptions may add extra 100kb on embedded targets. Also, it means that toolchain's build is bad;
- It is very easy to optimize and reduce extra footprint down to 17kb/13kb (heap management included) by setting up correct
CFLAGS_FOR_TARGET / CXXFLAGS_FOR_TARGET
during build and terminate handler afterwards; - There are much room for improvement for
gcc
, at least 3kb/2kb of image size could be removed for some targets.