Monday, 27 April 2015

-fsigned-char

Ancient History

I maintain a proprietary Cobol machine.  It was originally written with DOS and mainframes in mind.  It was ported across to Xenix, Unix, and Linux (in the kernel 1.2 days) and recently to 64-bit Linux.

The code includes an editor, a compiler, and a runtime.  The code is compiled to it's own machine.  The runtime is much like a Java runtime: a VM that executes code (as opposed to a VM that runs another OS)

The code is in K&R C.

Yesterday I ported it to ARM.  To a Raspberry Pi, specifically.

A Raspberry Pi has about 1,000 times more horsepower than the machines this code used to run on, so it's not so far from it's origins.

Wild free() chase

The C code would compile and run, but it would also crash soon after starting a VM and executing some code.

This was expected.  When porting the code to 64-bit, I had some similar issues.  And I expect similar issues when porting to big endian machines.

So I set off with gdb and valgrind -- wonderful tools -- and soon found a bunch of use-after-free.

However, this doesn't happen on Intel.  And use-after-free is an unexpected crash.

unsigned char

The error was actually triggered by code similar to:

char c = -1 // -1;
int i = c // -1;
call_something(i /* -1 */);
The problem is caused by a peculiar characteristic of C: char is machine specific.  char can be signed or unsigned, depending on the processor.

On x86-32 and x86-64 char is signed.  On ARM char is unsigned.  On ARM, the code becomes:
char c = -1; // 255
int i = c; // 255
call_something(i /* 255 */);
The difference is because C auto-promotes signed char to signed int, and unsigned char to signed int.

ebcdic

C also doesn't specify if char is ASCII or EBCDIC, or another character set.  If it is ASCII it doesn't specify if it's codepage 437 or 850.

char-ed heap

The code is actually using a char[] as a heap, and putting and pulling variables on that heap.
It's common for code to use their own stack, but a heap is a bit less common.

This means that char[1234] might have different uses.

(signed char)

Casting is a possibility, but it's very hard to do right in practice.  I'll just give one example:

Implicit Promotion

When porting to 64-bit, I had to take care of variable argument functions.  In modern C this is done with 
#include 

void foo(char *fmt, ...){
    va_list ap;
    va_start(ap, fmt);
    int i = va_arg(ap, int);
    ...
    va_end(ap);
}

This code will behave like:
void foo(char *fmt, int i){ }
However, it will auto-promote chars.  You cannot use va_arg(ap, char).  va_list and friends are macros. So you have to be very careful if you think you'll grab all instances of char.

Are you confident you'll (unsigned char) everywhere?

s/char/signed char/g

The next reaction is to just replace char with signed char in the variable declarations.  This has it's own set of problems.

Some calls are defined as taking char.  Feeding them specifically signed or unsigned chars can upset them.

However, this is the long-term solution.  Define the data correctly, and unambiguously.

I have 1304 instances to evaluate.

typedef char

The more correct way, in this case, is to typedef some structures to signed or unsigned chars.  This is indeed what should be done, and has been done, for some of the data to make it portable.

In other words, where we use char as a byte, we want to use sbyte and ubyte, and not char.  char should be used for characters.

-fsigned-char

gcc comes with a flag: -fsigned-char.

This will make the code behave as if char is signed.  It's specifically meant for this.  To make this work, you can try the following in configure.ac:
AC_C_CHAR_UNSIGNED
if test $ac_cv_c_char_unsigned = yes && test "$GCC" = yes; then
    CFLAGS+=" -fsigned-char "
fi

(void *)-1


NULL is a pointer

In C, NULL is usually a special case pointer.  It can mean the end of a string or list, or it can mean an error.

List Terminator

// list of strings
char * list[] = {
    "apple",
    "pear",
    NULL
}

Then we can use the following code to loop over the list:
char ** s;
for (s = list; *s; ++s) {
    printf("%s\n", *s);
}

Error

if (malloc(-1) == NULL) return (-1); 

Valid Pointer

NULL is also a pointer.  NULL is 0x0. It points to the very first block of RAM.  On the 8086 this would be the interrupt vector table.  On the Commodore 64 you might get the processor port data direction.

Unless you're the kernel and interested in hardware you're unlikely to care.

Today we care.

Today we want to use pointers to mark specific conditions that don't have meaning in specific C.  For example, in the list of strings above we might want to warn about an uninitialized string.

In particular, today I had to build a list recursively, and then loop over it and free() the elements.  However, it was possible to have an empty entry.

We can't have:

// list of strings
char * list[] = {
    "apple",
    "pear",
    NULL,
    "banana",
    NULL
}
we'd never get to "banana".  We'd never free it.  There would be a memory leak.

malloc() == NULL => success

First, we'll discus when NULL is valid.

Under one condition malloc() will return NULL, but it will also be a success.  If malloc() returns 0x0 -- a pointer to the beginning of RAM, it will actually return successfully, but report an error.

This can happen when we do malloc(sizeof(everything)).  Everything might be all RAM, but is more likely to be all map-able memory, which will include swap.  It will be even bigger if overcommit is enabled.

When this happens, the memory allocated must start at 0.  If it starts at 0x1, it misses the first byte, so it hasn't allocated everything.

Therefore malloc(sizeof(everything)) can only ever return 0x0 -- on both success and failure.

If 0x0 points to a hardware-specific area, the beginning of RAM might be remapped away, and malloc(sizeof(everything)) might point to the start of RAM, say 0x100.  However, if this is the case, 0x0 is still valid: it points to a hardware-specific area.

0xdeadbeef

When debugging programs it can help to initialise variables to something easily visible.  The default in C is uninitialized: whatever is in RAM.  The default when using bzero() is 0x0.

The default is 0x0 because it initializes all memory to NULL, which will terminate a string or a list in C.  It's also appeals to psychology: unused memory is empty.

When debugging it helps to initialize memory to something easily visible in a debugger.  Unfortunately there are lots of valid zeroes.  So if we initialize to 0x0, we don't know if the variable was never used, or actually is set to zero.

Debuggers also speak hex, so we want a non-zero value that's easy to spot in hex.  Hence: 0xdeadbeef, which is easy to spot and almost English.

(void *)-1

Just as 0x0 is an interesting answer, so is -1, or 0xffffffff (on 32bit machines.)

Is (void *)-1 a valid pointer?


(void *)-1 can only ever point to a one-byte block of RAM.  It can only ever point to malloc(1).  It cannot contain more than one byte.

(void *)-1 is not valid on any machine that requires memory to be word or page or int aligned.

(void *)-1 might be valid in a memory allocator that allocates top-down, but even then it's likely to skip, to allow for easy realloc().

In Use


If you need a list of arrays that could be empty, you might use:

char * list[] = {
    "apple",
    "pear",
    (void *)-1,
    "banana",
    NULL
}
Now we can check for (void *)-1 to know that the list isn't finished (not NULL), but that the contents at this index are uninitialized.