grack.com

Thanks to Raph Levien and Electronic Arts for inspiring this post!

As a thin layer on top of assembly language, C’s integer arithmetic APIs have always been minimal, effectively just mapping the underlying assembly opcodes to the C arithmetic operators. In addition, while unsigned arithmetic can safely overflow in C, signed arithmetic overflow is considered undefined behaviour, and UB can end up in heartache for C developers.

More modern languages like Rust have a much richer integer API, however. By default, using the standard addition and subtraction operators in debug mode, integer APIs will panic! on overflow or underflow. The same operators will wrap in two’s-complement mode in release mode, though the behaviour is considered defined. If the developer wants to specify carrying, wrapping, checked, or saturating operations, APIs for each of these modes are available.

We don’t have these convenient APIs available in C yet (see the epilogue for some nuance), but it would be great to have them. Given that signed arithmetic overflow is undefined behaviour, can we build a function with the following C signature that works?

bool will_add_overflow(int32_t a, int32_t b);

All modern machines use two’s complement representation for negative integers, but C was developed when computing was experimenting with other forms of representing signed integers. In this post, we will safely assume that all processors we’re targetting on are not using one of the alternative forms. If you find yourself programming for esoteric machines, you may wish to consult your local manual or guru.

A valid, quick-and-dirty solution to this would be to use integer promotion and add 64-bit signed integers instead, checking to see if the result is within range of an int16_t:

bool will_add_overflow_64bit(int32_t a, int32_t b) {
    // a and b are promoted to 64-bit signed integers
    int64_t result = (int64_t)a + b;
    if (result < INT32_MIN || result > INT32_MAX) {
        return true;
    }
    return false;
}

No undefined behaviour! It also has the advantage of being easily read and obviously correct. But this required sign-extending two 32-bit numbers and performing two 64-bit additions:

will_add_overflow_64bit:
        movsxd  rax, edi
        movsxd  rcx, esi
        add     rcx, rax
        movsxd  rax, ecx
        cmp     rax, rcx
        setne   al
        ret

We can do better by taking advantage of the fact that on two’s-complement machines, addition is bitwise-identical between signed and unsigned numbers so long as you ignore carry, overflow, underflow and any other flags. In addition, the C specification (C99 6.3.1.3 ¶2) guarantees that the bit pattern will be preserved on a two’s-complement system.

We know that unsigned overflow is not UB, and we know that we can only overflow if a > 0 and b > 0, and we can only underflow if a < 0 and b < 0. If either a or b is zero, we’re safe. We also know that adding two positive integers must result in a positive result if no overflow occurred. For two negative integers, the result must also be negative. If we find that the sign of the sum does not match the sign expected, we’ve wrapped around!

bool will_add_overflow_if(int32_t a, int32_t b) {
    // Explicitly convert to uint32_t and then back
    int32_t c = (int32_t)((uint32_t)a + (uint32_t)b);
    if (a > 0 && b > 0 && c < 0) {
        return true;
    }
    if (a < 0 && b < 0 && c >= 0) {
        return true;
    }
    return false;
}

And we get a fairly hefty assembly representation:

will_add_overflow_if:
        lea     ecx, [rsi + rdi]
        test    edi, edi
        jle     .LBB2_3
        test    esi, esi
        jle     .LBB2_3
        mov     al, 1
        test    ecx, ecx
        jns     .LBB2_3
        ret
.LBB2_3:
        test    esi, edi
        sets    dl
        test    ecx, ecx
        setns   al
        and     al, dl
        ret

This is arguably a bit worse, as now we have a branch in the mix. But we can start to see a pattern here:

abcresult
> 0> 0< 0true
< 0< 0>= 0true

In two’s-complement, the expression x < 0 is equivalent to the expression (x & 0x80000000) == 0x80000000. Similarly, x >= 0 is equivalent to (x & 0x80000000) == 0.

Let’s create a NEG macro with the above expression and reproduce our pseudo-truth table in code. Note that we’ll also collapse the if statements into a single boolean expression so we can eliminate those branches:

bool will_add_overflow_expression(int32_t a_, int32_t b_) {
    // Explicitly work with uint32_t in this function
    uint32_t a = (uint32_t)a_, b = (uint32_t)b_;
    uint32_t c = (uint32_t)a + (uint32_t)b;
    #define NEG(x) (((uint32_t)(x) & 0x80000000) == 0x80000000)
    return ((!NEG(a) && !NEG(b) && NEG(c)) 
        || (NEG(a) && NEG(b) && !NEG(c)));
    #undef NEG
}

This is looking better, but because we’re using short-circuiting logic, those branches are still there: we still have a jump!

will_add_overflow_expression:
        mov     eax, esi
        or      eax, edi
        setns   dl
        mov     ecx, esi
        add     ecx, edi
        sets    al
        and     al, dl
        test    edi, edi
        jns     .LBB3_3
        test    al, al
        jne     .LBB3_3
        test    esi, esi
        sets    dl
        test    ecx, ecx
        setns   al
        and     al, dl
.LBB3_3:
        ret

We can get rid of the branches by using non-short-circuiting bitwise operators:

bool will_add_overflow_bitwise(int32_t a_, int32_t b_) {
    uint32_t a = (uint32_t)a_, b = (uint32_t)b_;
    uint32_t c = (uint32_t)a + (uint32_t)b;
    #define NEG(x) (((uint32_t)(x) & 0x80000000) == 0x80000000)
    return ((!NEG(a) & !NEG(b) & NEG(c)) 
        | (NEG(a) & NEG(b) & !NEG(c)));
    #undef NEG
}

And now it’s starting to look pretty compact (though we can do better):

will_add_overflow_bitwise:
        lea     ecx, [rsi + rdi]
        mov     eax, esi
        or      eax, edi
        and     esi, edi
        xor     eax, esi
        not     eax
        and     eax, ecx
        xor     eax, esi
        shr     eax, 31
        ret

Notice that the assembly gives us a bit of a hint here that repeated use of our macro isn’t actually necessary. The sign bit we’re interested in isn’t tested until the end of the function! Because we’re testing the same bit in every part of the expression, and bits in a given position only interact with other bits in the same position, we can pull that bit test out of the whole expression:

bool will_add_overflow_bitwise_2(int32_t a_, int32_t b_) {
    uint32_t a = (uint32_t)a_, b = (uint32_t)b_;
    uint32_t c = (uint32_t)a + (uint32_t)b;
    #define NEG(x) (((uint32_t)(x) & 0x80000000) == 0x80000000)
    return NEG((~a & ~b & c) | (a & b & ~c));
    #undef NEG
}

We can also make use of the knowledge that testing the sign bit is the same as an unsigned shift right:

bool will_add_overflow_bitwise_3(int32_t a_, int32_t b_) {
    uint32_t a = (uint32_t)a_, b = (uint32_t)b_;
    uint32_t c = (uint32_t)a + (uint32_t)b;
    return ((~a & ~b & c) | (a & b & ~c)) >> 31;
}

Not too bad! But let’s revisit the truth table and instead use the value of the sign bit directly. What we see is that a and b need to be the same value, and c needs to be the opposite value:

abc
110
001

This truth table shows that what we ultimately want to test is this:

(a == 1 && b == 1 && c == 0) || (a == 0 && b == 0 && c == 1)

… but with a bit of work, we can simplify this down to two shorter expression candidates:

(a == b) && (a == !c)
(c == !a) && (c == !b)

For bit twiddling like we’re doing here, xor (^) can work like a “not-equals” operator (outputs 1 iff the inputs are 0,1 or 1,0), which means we can re-write our two expressions like so:

~(a ^ b) & (c ^ a)
(c ^ a) & (c ^ b)

By looking at those two options, is there a hint that one might be cheaper to implement? Let’s plug both into the compiler and see what we get!

bool will_add_overflow_optimized_a(int32_t a_, int32_t b_) {
    uint32_t a = (uint32_t)a_, b = (uint32_t)b_;
    uint32_t c = (uint32_t)a + (uint32_t)b;
    return (~(a ^ b) & (c ^ a)) >> 31;
}

bool will_add_overflow_optimized_b(int32_t a_, int32_t b_) {
    uint32_t a = (uint32_t)a_, b = (uint32_t)b_;
    uint32_t c = (uint32_t)a + (uint32_t)b;
    return ((c ^ a) & (c ^ b)) >> 31;
}

And the resulting compiled versions:

will_add_overflow_optimized_a:
        lea     eax, [rsi + rdi]
        xor     eax, edi
        mov     ecx, edi
        xor     ecx, esi
        not     ecx
        and     eax, ecx
        shr     eax, 31
        ret
will_add_overflow_optimized_b:
        lea     eax, [rsi + rdi]
        xor     edi, eax
        xor     eax, esi
        and     eax, edi
        shr     eax, 31
        ret

We have a clear winner here: the compiler can do a much better job with (c ^ a) & (c ^ b). This is most likely because of the common sub-expression and the removal of the bitwise-not operator.

We can also confirm that there’s no known undefined behaviour by compiling it with clang’s -fsanitize=undefined feature. No UB warnings are printed, which means no UB was detected!

Epilogue

While this is the fastest we can get with bog-standard C99, this isn’t necessarily the best we can do.

Rust makes use of the compiler intrinsics to access the overflow flag of the processor directly:

pub fn add(a: i32, b: i32) -> bool {
    a.checked_add(b).is_none()
}

example::add:
        add     edi, esi
        seto    al
        ret

It turns out that both GCC and LLVM have C intrinsics that you can use. While they are non-portable to some compilers, they drastically simplify the assembly output!

bool will_add_overflow_intrinsic(int32_t a, int32_t b) {
    int32_t result;
    return __builtin_add_overflow(a, b, &result);
}

And, just like with the Rust compiler above, this generates optimal assembly!

will_add_overflow_intrinsic:
        add     edi, esi
        seto    al
        ret

Not to worry about this being so deeply compiler-specific for now, however. This will be standardized in C23 with the addition of the functions in the stdckdint.h header.

A full suite of tests to explore the solutions is available on Godbolt or as a Gist.

Read full post