Saturday, April 16, 2011

Illogical Promotions -- C & C++ type conversion logic isn't logical.

When writing code that deals with exact numbers in dynamic languages such as JavaScript or Perl it can be a nuisance to constantly check the types of variables, but at least one knows that it's necessary to do so.

Statically typed languages such as C and C++ give the illusion that variables containing primitive types (such as signed and unsigned int) need not have their types checked at run time.

However, as I'll demonstrate, the illusion is very thin and is likely the cause of many security vulnerabilities.  In other words: runtime type checking is still needed in certain instances, even in (especially in) C.

If you would like a live demonstration of the program that generated this output, you may download the source code of this blog entry.


To begin, we'll compare two signed integers in C:

   ( -1 <  42 ) : true
   ( -1 == 42 ) : false
   ( -1 >  42 ) : false


OK, everything looks good so far...

Next, let's compare a signed integer to an unsigned integer:

   ( 42 <  1u ) : false
   ( 42 == 1u ) : false
   ( 42 >  1u ) : true


That's all well and good too...

...but, watch what happens if our signed integer is negative:

   ( -1 <  1u ) : false
   ( -1 == 1u ) : false
   ( -1 >  1u ) : true
   ( -1 <  42u ) : false
   ( -1 == 42u ) : false
   ( -1 >  42u ) : true

Do these statements seem correct?  Certainly not!  These problematic results are due to the implicit type conversions that occur when operations are performed on integers that are differently signed.

Programmers are expected to know about the subtle type conversions that occur frequently in their programs.  Practically speaking, however, many do not and incorrectly trust that their language will "do the right thing" when presented with a simple comparison such as:

if ( a < b ) { doSomething(); }

Technically, the compiler does do the correct conversion as per C99 standard; However, these standards seem to be particularly at odds with another more broadly adopted standard: Algebra.  Clearly doing "the right thing" is ambiguous given this sad situation...

From the following comparison we can see why we are given the problematic results above:

   ( (-1 cast to unsigned is 4294967295) <  1 ) : false
   ( (-1 cast to unsigned is 4294967295) == 1 ) : false
   ( (-1 cast to unsigned is 4294967295) >  1 ) : true
   ( (-1 cast to unsigned is 4294967295) <  42 ) : false
   ( (-1 cast to unsigned is 4294967295) == 42 ) : false
   ( (-1 cast to unsigned is 4294967295) >  42 ) : true
As evident from this example, the signed integer has been "promoted" to an unsigned integer when comparing the two.

One of the strengths of a statically typed language, like C, is that one knows the type and sign of their variables and can avoid making many run-time checks. Unfortunately, we are only able to take advantage of this benefit if we only perform operations on variables of the exact same type.

It would be nice if our C compiler warned us when problematic type conversions and promotions occur in our sources... (G++ does, but not GCC). Clearly if one were to write (a < b) they want to check if the value of 'a' is less than the value of 'b'.  It makes no logical sense for that statement to mean: is the value of 'a' less than 'b' if 'a' is cast as an unsigned integer?  If that behavior is desired one would just write:

( (unsigned)a < b )

However, all is not lost, we can explicitly cast the unsigned integer to a signed integer to achieve the desired (algebraically correct) results:

   ( -1 <  (signed)1 ) : true
   ( -1 == (signed)1 ) : false
   ( -1 >  (signed)1 ) : false

Well, that's settled then.  A simple rule of thumb can be followed: When comparing signed and unsigned integers, we must cast the unsigned integer into a signed integer...

...this rule will yield correct results, mostly.  That is, of course, until our unsigned value has its uppermost bit set:

   ( -1 <  2150000000 ) : false
   ( -1 == 2150000000 ) : false
   ( -1 >  2150000000 ) : true

In the words of Homer:
    "It is not good to have a rule of many.", and also, "Doh!"

Unfortunately, we're not going to get away with just a simple reinterpretation cast that would most likely have zero overhead in the machine code; Instead more logic is required.  If we first compare the signed integer to 0 we can determine if it should be comparable to an unsigned integer or not.

    // Given two variables (a and b),
    // and their signs (signed & unsigned).
    signed int a = -1;

    unsigned int b = 1;

    // Less-than comparison: if ( a < b ) ...
    if ( (a < 0) || (unsigned)a < b ) {

        /* a is really less-than b */
    }

    // Equal-to comparison: if ( a == b ) ...
    if ( (a >= 0) && (unsigned)a == b ) {

        /* a really is equal-to b */
    }

    // Greater-than comparison: if ( a > b ) ...
    if ( (a > 0) && (unsigned)a > b ) {

        /* a really is greater-than b */
    }

Thus yielding:

   ( -1 <  42 ) : true
   ( -1 == 42 ) : false
   ( -1 >  42 ) : false
   ( -1 <  1 ) : true
   ( -1 == 1 ) : false
   ( -1 >  1 ) : false
   ( -1 <  2150000000 ) : true
   ( -1 == 2150000000 ) : false
   ( -1 >  2150000000 ) : false
   ( 42 <  42 ) : false
   ( 42 == 42 ) : true
   ( 42 >  42 ) : false
   ( 42 <  1 ) : false
   ( 42 == 1 ) : false
   ( 42 >  1 ) : true
   ( 42 <  2150000000 ) : true
   ( 42 == 2150000000 ) : false
   ( 42 >  2150000000 ) : false
   ( 0 <  42 ) : true
   ( 0 == 42 ) : false
   ( 0 >  42 ) : false
   ( 0 <  1 ) : true
   ( 0 == 1 ) : false
   ( 0 >  1 ) : false
   ( 0 <  2150000000 ) : true
   ( 0 == 2150000000 ) : false
   ( 0 >  2150000000 ) : false
   ( -1 <  0 ) : true
   ( -1 == 0 ) : false
   ( -1 >  0 ) : false
   ( 42 <  0 ) : false
   ( 42 == 0 ) : false
   ( 42 >  0 ) : true
   ( 0 <  0 ) : false
   ( 0 == 0 ) : true
   ( 0 >  0 ) : false


Considering that one of the strong points of a typed language is that you can avoid superfluous double checking of variables, it boggles the mind as to why the C99 standard requires type promotion rules that undermine this very benefit.

I suppose a C purist may oppose automatically inserting extra code for the sake of sanity if it means sacrificing speed or incompatibility with older software.

However, the glaring fact that core C language operators such as <, ==, and > behave in wildly unexpected ways to people who are familiar with elementary mathematics (such as basic algebra) is a far greater issue to address, in my opinion.  The strange behavior exists for no good reason considering that the additional logic is very minimal, and provides mathematically correct results.

Even the most intimately knowledgeable of C programmers must agree that the current behavior of <, == and > is unintuitive -- It's a stumbling block for beginners, and a security risk for any who use software made by anyone less careful than a code-surgeon.

(Mention not those who maintain a body of source code & need to change the signedness of a variable's type...)

Do you not agree that it would make far more logical sense to automatically perform the above logic when dealing with differently signed types?

The current illogical, yet marginally faster, promotions could still be achieved via manually type casting the values to the same signs before comparing.  Operations performed on two variables of the same type need not be affected at all.  Compatibility with legacy software could be provided the same way it has always been provided -- configurable compiler settings.

Note that although C++ has powerful operator overloading mechanisms, they are neutered when it comes to built-in types, and thus useless for creating our own solution for this issue. One can not overload integer operators:

bool ::operator<( const int &a, const int &b ){

    /* Err: Only enum or class
       operators can be overloaded. */ 
}

It's no wonder many of my fellow C & C++ coders are balding -- Can you blame us for scratching our skulls so frequently?

Wednesday, November 3, 2010

Hypocritical Google Buzz "Privacy" Suit

Google has settled a class-action lawsuit filed by the The Garden City Group Inc.

This settlement affects the rights of all users of Gmail within the U.S. even if they do nothing at all.

I was unaware that a class-action lawsuit over Buzz was filed.  Fortunately Google emailed notifications about the settlement to their users.  If not for Google's email the GCG settlement would have vogonized some of my rights.


The following excerpt from BuzzClassAction.com explains how to exclude yourself from the settlement.
How do I get out of the Settlement?

To exclude yourself from the Settlement, you must send a letter or other written document by mail saying that you want to be excluded from In re Google Buzz User Privacy Litigation, No. 5:10-cv-00672-JW. Be sure to include your full name, address, reason why you want out of the Settlement, as well as proof that you used Gmail at some point after February 9, 2010, your signature, and the date. You must mail your request for exclusion so that it is received no later than December 6, 2010 to:

In re Google Buzz User Privacy Litigation
c/o The Garden City Group, Inc.
P.O. Box 91088
Seattle, WA 98111-9188

This just doesn't make sense to me.   Why would I be required to give up my private information to the Garden City Group in order to opt-out of a lawsuit against privacy violations?

_____________________________________________


if (more_privacy == (less_privacy || less_rights)) we_lose();

The alleged privacy violations of Google Buzz are certainly less of a privacy intrusion than requiring me to send a letter to the GCG.

I was not required to give Google my full name, or my mailing address. However, The Garden City Group has coerced me into violating my own privacy.  I am required to divulge my personal information to them or else give up my legal rights.

_____________________________________________


do { as_they_say } while ( NOT as_they_do );


The GCG suit against Google hinges on the "automatic opt-in" that Google used for the initial Buzz deployment.

The GCG's lawsuits complain that Google's Buzz should not have been rolled out as an automatic opt-in; Instead the Buzz service should have defaulted to opted out.

Opt-out defaults sound like a reasonable roll-out plan... Why then does The Garden City Group fail to heed their own advise?  The GCG made their own settlement opt-in.

That's right, If you are a U.S. Gmail user then you are opted-in to the GCG's settlement by default.


Google may have opted it's users in to sharing their email contacts, but they also
gave them a simple button to opt-out of Buzz.

However, Without my consent or prior notice the GCG opted me into a legally binding settlement which removes my legal rights and requires me to divulge my personal information in writing via postal mail in order to opt out!

_____________________________________________


if (is_good(idea) && ignored(idea)) throw(GCG_Error);

Paraphrasing the GCG might make this double standard a bit more clear:
 "Google, you shouldn't automatically opt your users into new services."
"We're automatically opting all Gmail users into our class action lawsuit settlement."
"Google, you should notify users before including them in new services, and provide clear and simple information about how they can opt-out of new features."
"We're did not notify any Gmail users prior to their inclusion in the settlement.  To opt-out of our settlement you may not use phone or e-mail.  You must mail a letter to us containing your personal information, and also somehow prove you're a Gmail user."

I can't think of any way to prove that I'm a Gmail user that doesn't involve using email.  Can you?

Why do they need proof anyhow?  If I'm not a Gmail user, the suit doesn't apply and I should be excluded anyway.

Perhaps they just want to make sure they don't accidentally not take away someone's rights...  Wait, tell me again: What are the advantages of remaining opted-in to this settlement?  Less legal rights.

Of course, if the GCG allowed email correspondence as an opt-out option they would have the proof they desire in the From field...

Truly, my logic lobes are not equipped to handle this amount of hypocrisy.