Ancient History
I maintain a proprietary Cobol machine. It was originally written with DOS and mainframes in mind. It was ported across to Xenix, Unix, and Linux (in the kernel 1.2 days) and recently to 64-bit Linux.The code includes an editor, a compiler, and a runtime. The code is compiled to it's own machine. The runtime is much like a Java runtime: a VM that executes code (as opposed to a VM that runs another OS)
The code is in K&R C.
Yesterday I ported it to ARM. To a Raspberry Pi, specifically.
A Raspberry Pi has about 1,000 times more horsepower than the machines this code used to run on, so it's not so far from it's origins.
Wild free() chase
The C code would compile and run, but it would also crash soon after starting a VM and executing some code.This was expected. When porting the code to 64-bit, I had some similar issues. And I expect similar issues when porting to big endian machines.
So I set off with gdb and valgrind -- wonderful tools -- and soon found a bunch of use-after-free.
However, this doesn't happen on Intel. And use-after-free is an unexpected crash.
unsigned char
The error was actually triggered by code similar to:The problem is caused by a peculiar characteristic of C: char is machine specific. char can be signed or unsigned, depending on the processor.char c = -1 // -1; int i = c // -1; call_something(i /* -1 */);
On x86-32 and x86-64 char is signed. On ARM char is unsigned. On ARM, the code becomes:
The difference is because C auto-promotes signed char to signed int, and unsigned char to signed int.char c = -1; // 255 int i = c; // 255 call_something(i /* 255 */);
ebcdic
C also doesn't specify if char is ASCII or EBCDIC, or another character set. If it is ASCII it doesn't specify if it's codepage 437 or 850.char-ed heap
The code is actually using a char[] as a heap, and putting and pulling variables on that heap.It's common for code to use their own stack, but a heap is a bit less common.
This means that char[1234] might have different uses.
(signed char)
Casting is a possibility, but it's very hard to do right in practice. I'll just give one example:Implicit Promotion
When porting to 64-bit, I had to take care of variable argument functions. In modern C this is done with#includevoid foo(char *fmt, ...){ va_list ap; va_start(ap, fmt); int i = va_arg(ap, int); ... va_end(ap); }
This code will behave like:
void foo(char *fmt, int i){ }
However, it will auto-promote chars. You cannot use va_arg(ap, char). va_list and friends are macros. So you have to be very careful if you think you'll grab all instances of char.Are you confident you'll (unsigned char) everywhere?
s/char/signed char/g
The next reaction is to just replace char with signed char in the variable declarations. This has it's own set of problems.Some calls are defined as taking char. Feeding them specifically signed or unsigned chars can upset them.
However, this is the long-term solution. Define the data correctly, and unambiguously.
I have 1304 instances to evaluate.
typedef char
The more correct way, in this case, is to typedef some structures to signed or unsigned chars. This is indeed what should be done, and has been done, for some of the data to make it portable.In other words, where we use char as a byte, we want to use sbyte and ubyte, and not char. char should be used for characters.
-fsigned-char
gcc comes with a flag: -fsigned-char.This will make the code behave as if char is signed. It's specifically meant for this. To make this work, you can try the following in configure.ac:
AC_C_CHAR_UNSIGNED
if test $ac_cv_c_char_unsigned = yes && test "$GCC" = yes; then
CFLAGS+=" -fsigned-char "
fi
No comments:
Post a Comment