Statically typed languages such as C and C++ give the illusion that variables containing primitive types (such as signed and unsigned int) need not have their types checked at run time.
However, as I'll demonstrate, the illusion is very thin and is likely the cause of many security vulnerabilities. In other words: runtime type checking is still needed in certain instances, even in (especially in) C.
If you would like a live demonstration of the program that generated this output, you may download the source code of this blog entry.
To begin, we'll compare two signed integers in C:
( -1 < 42 ) : true
( -1 == 42 ) : false
( -1 > 42 ) : false
OK, everything looks good so far...
Next, let's compare a signed integer to an unsigned integer:
( 42 < 1u ) : false
( 42 == 1u ) : false
( 42 > 1u ) : true
That's all well and good too...
...but, watch what happens if our signed integer is negative:
( -1 < 1u ) : false
( -1 == 1u ) : false
( -1 > 1u ) : true
( -1 < 42u ) : false
( -1 == 42u ) : false
( -1 > 42u ) : true
Do these statements seem correct? Certainly not! These problematic results are due to the implicit type conversions that occur when operations are performed on integers that are differently signed.
Programmers are expected to know about the subtle type conversions that occur frequently in their programs. Practically speaking, however, many do not and incorrectly trust that their language will "do the right thing" when presented with a simple comparison such as:
if ( a < b ) { doSomething(); }
Technically, the compiler does do the correct conversion as per C99 standard; However, these standards seem to be particularly at odds with another more broadly adopted standard: Algebra. Clearly doing "the right thing" is ambiguous given this sad situation...
From the following comparison we can see why we are given the problematic results above:
( (-1 cast to unsigned is 4294967295) < 1 ) : false
( (-1 cast to unsigned is 4294967295) == 1 ) : false
( (-1 cast to unsigned is 4294967295) > 1 ) : true
( (-1 cast to unsigned is 4294967295) < 42 ) : false
( (-1 cast to unsigned is 4294967295) == 42 ) : false
( (-1 cast to unsigned is 4294967295) > 42 ) : true As evident from this example, the signed integer has been "promoted" to an unsigned integer when comparing the two.
One of the strengths of a statically typed language, like C, is that one knows the type and sign of their variables and can avoid making many run-time checks. Unfortunately, we are only able to take advantage of this benefit if we only perform operations on variables of the exact same type.
It would be nice if our C compiler warned us when problematic type conversions and promotions occur in our sources... (G++ does, but not GCC). Clearly if one were to write (a < b) they want to check if the value of 'a' is less than the value of 'b'. It makes no logical sense for that statement to mean: is the value of 'a' less than 'b' if 'a' is cast as an unsigned integer? If that behavior is desired one would just write:
( (unsigned)a < b )
However, all is not lost, we can explicitly cast the unsigned integer to a signed integer to achieve the desired (algebraically correct) results:
( -1 < (signed)1 ) : true
( -1 == (signed)1 ) : false
( -1 > (signed)1 ) : false
Well, that's settled then. A simple rule of thumb can be followed: When comparing signed and unsigned integers, we must cast the unsigned integer into a signed integer...
...this rule will yield correct results, mostly. That is, of course, until our unsigned value has its uppermost bit set:
( -1 < 2150000000 ) : false
( -1 == 2150000000 ) : false
( -1 > 2150000000 ) : true
In the words of Homer:
"It is not good to have a rule of many.", and also, "Doh!"
Unfortunately, we're not going to get away with just a simple reinterpretation cast that would most likely have zero overhead in the machine code; Instead more logic is required. If we first compare the signed integer to 0 we can determine if it should be comparable to an unsigned integer or not.
// Given two variables (a and b),
// and their signs (signed & unsigned).
signed int a = -1;
unsigned int b = 1;
// Less-than comparison: if ( a < b ) ...
if ( (a < 0) || (unsigned)a < b ) {
/* a is really less-than b */
}// Equal-to comparison: if ( a == b ) ...
if ( (a >= 0) && (unsigned)a == b ) {
/* a really is equal-to b */
}
// Greater-than comparison: if ( a > b ) ...
if ( (a > 0) && (unsigned)a > b ) {
/* a really is greater-than b */
}
Thus yielding:
( -1 < 42 ) : true
( -1 == 42 ) : false
( -1 > 42 ) : false
( -1 < 1 ) : true
( -1 == 1 ) : false
( -1 > 1 ) : false
( -1 < 2150000000 ) : true
( -1 == 2150000000 ) : false
( -1 > 2150000000 ) : false
( 42 < 42 ) : false
( 42 == 42 ) : true
( 42 > 42 ) : false
( 42 < 1 ) : false
( 42 == 1 ) : false
( 42 > 1 ) : true
( 42 < 2150000000 ) : true
( 42 == 2150000000 ) : false
( 42 > 2150000000 ) : false
( 0 < 42 ) : true
( 0 == 42 ) : false
( 0 > 42 ) : false
( 0 < 1 ) : true
( 0 == 1 ) : false
( 0 > 1 ) : false
( 0 < 2150000000 ) : true
( 0 == 2150000000 ) : false
( 0 > 2150000000 ) : false
( -1 < 0 ) : true
( -1 == 0 ) : false
( -1 > 0 ) : false
( 42 < 0 ) : false
( 42 == 0 ) : false
( 42 > 0 ) : true
( 0 < 0 ) : false
( 0 == 0 ) : true
( 0 > 0 ) : false
Considering that one of the strong points of a typed language is that you can avoid superfluous double checking of variables, it boggles the mind as to why the C99 standard requires type promotion rules that undermine this very benefit.
I suppose a C purist may oppose automatically inserting extra code for the sake of sanity if it means sacrificing speed or incompatibility with older software.
However, the glaring fact that core C language operators such as <, ==, and > behave in wildly unexpected ways to people who are familiar with elementary mathematics (such as basic algebra) is a far greater issue to address, in my opinion. The strange behavior exists for no good reason considering that the additional logic is very minimal, and provides mathematically correct results.
Even the most intimately knowledgeable of C programmers must agree that the current behavior of <, == and > is unintuitive -- It's a stumbling block for beginners, and a security risk for any who use software made by anyone less careful than a code-surgeon.
(Mention not those who maintain a body of source code & need to change the signedness of a variable's type...)
Do you not agree that it would make far more logical sense to automatically perform the above logic when dealing with differently signed types?
The current illogical, yet marginally faster, promotions could still be achieved via manually type casting the values to the same signs before comparing. Operations performed on two variables of the same type need not be affected at all. Compatibility with legacy software could be provided the same way it has always been provided -- configurable compiler settings.
Note that although C++ has powerful operator overloading mechanisms, they are neutered when it comes to built-in types, and thus useless for creating our own solution for this issue. One can not overload integer operators:
bool ::operator<( const int &a, const int &b ){
/* Err: Only enum or class
operators can be overloaded. */
}
It's no wonder many of my fellow C & C++ coders are balding -- Can you blame us for scratching our skulls so frequently?