Monday 10 April 2023

mv

 NAME

    mv - move blog

 

SYNOPSIS

    mv URL


DESCRIPTION

    Blog was moved off blogger


SEE ALSO

    https://blog.deschouwer.co.za/


AUTHOR

    Berend De Schouwer

Wednesday 13 May 2015

cat unknown

cat unknown

Postgresql can store timestamps.  It prefers to store timestamps with timezone information.  You can store a timestamp, date, time or interval.

Postgresql also lets you use strings when inserting or selecting the date, which you can then cast.  For example:
select 'Mon May 11 11:21:31 SAST 2015'::date;
This works, but only for some timezones:
select 'Mon May 11 11:21:31 CAT 2015'::date; ERROR: invalid input syntax for type date: "Mon May 11 11:21:31 CAT 2015" LINE 1: select 'Mon May 11 11:21:31 CAT 2015'::date ;
CAT is unknown.

SAST is South African Standard Time.  CAT is Central African Time.

africa behind

My first reaction is that the African locales are left behind again.  I've encountered this before.  For example: Botswana's Pula currency, or Zambia, in Java 1.4 and Java 1.6.

It turns out that this isn't the reason.  It's just part of the reason.

cat isn't always a cat

Postgresql uses two tables for timezone lookups.  pg_timezone_names and pg_timezone_abbrevs.

pg_timezone_names works with more complete timezones, eg. Asia/Ho_Chi_Minh. pg_timezone_abbrevs works with the abbreviations.  However, pg_timezone_abbrevs must do forward and reverse lookups, and it turns out that timezone names aren't unique.

For example, at Wikipedia you'll find that ACT can mean Acre Time, or ASEAN Common Time.  It's simply not possible for Postgresql to know what all timezones mean.

As a note: there is only one CAT in that table.

problems

os, java and postgresql

It's frustrating that the OS, Java and Postgresql do not share locale information.  It would be nice if all three services actually provided exactly the same information, but they don't.

Zambia's currency changed in January 2014.  Was the OS updated?  Java?  Postgresql?  At the same time?

testing

Testing for these strings is hard.  Do you know if all your applications work with the input?

africa really is left behind

There is only one CAT, yet Postgresql does not add it, and will not add it (I filed a bug report, #13267)  Yet Postgresql does add CST which has five different meanings.  And it adds the North American -- and only the North American -- definition.

solutions

early input sanitation

Input should be sanitized.  That string should never have reached the database.  Input should be sanitized and converted away from ambiguous strings as early as possible.  This is just good programming practice.

late output representation

The output should be converted to human readable strings as late as possible.  Again, this is good programming practice.

packaging

There is a very real packaging problem here.  OS, Postgresql and Java definitely should have the same information.  There are likely other packages with their own information.

unconfirmed

I haven't checked which Linux (or Windows, or MacOS X) distributions have different information for these services.  I have checked that some (RedHat Enterprise 5 and 6, Postgresql 9.2 and 9.3, Java 1.4, 1.6 and 1.7) have different information for some of the values.

Monday 27 April 2015

-fsigned-char

Ancient History

I maintain a proprietary Cobol machine.  It was originally written with DOS and mainframes in mind.  It was ported across to Xenix, Unix, and Linux (in the kernel 1.2 days) and recently to 64-bit Linux.

The code includes an editor, a compiler, and a runtime.  The code is compiled to it's own machine.  The runtime is much like a Java runtime: a VM that executes code (as opposed to a VM that runs another OS)

The code is in K&R C.

Yesterday I ported it to ARM.  To a Raspberry Pi, specifically.

A Raspberry Pi has about 1,000 times more horsepower than the machines this code used to run on, so it's not so far from it's origins.

Wild free() chase

The C code would compile and run, but it would also crash soon after starting a VM and executing some code.

This was expected.  When porting the code to 64-bit, I had some similar issues.  And I expect similar issues when porting to big endian machines.

So I set off with gdb and valgrind -- wonderful tools -- and soon found a bunch of use-after-free.

However, this doesn't happen on Intel.  And use-after-free is an unexpected crash.

unsigned char

The error was actually triggered by code similar to:

char c = -1 // -1;
int i = c // -1;
call_something(i /* -1 */);
The problem is caused by a peculiar characteristic of C: char is machine specific.  char can be signed or unsigned, depending on the processor.

On x86-32 and x86-64 char is signed.  On ARM char is unsigned.  On ARM, the code becomes:
char c = -1; // 255
int i = c; // 255
call_something(i /* 255 */);
The difference is because C auto-promotes signed char to signed int, and unsigned char to signed int.

ebcdic

C also doesn't specify if char is ASCII or EBCDIC, or another character set.  If it is ASCII it doesn't specify if it's codepage 437 or 850.

char-ed heap

The code is actually using a char[] as a heap, and putting and pulling variables on that heap.
It's common for code to use their own stack, but a heap is a bit less common.

This means that char[1234] might have different uses.

(signed char)

Casting is a possibility, but it's very hard to do right in practice.  I'll just give one example:

Implicit Promotion

When porting to 64-bit, I had to take care of variable argument functions.  In modern C this is done with 
#include 

void foo(char *fmt, ...){
    va_list ap;
    va_start(ap, fmt);
    int i = va_arg(ap, int);
    ...
    va_end(ap);
}

This code will behave like:
void foo(char *fmt, int i){ }
However, it will auto-promote chars.  You cannot use va_arg(ap, char).  va_list and friends are macros. So you have to be very careful if you think you'll grab all instances of char.

Are you confident you'll (unsigned char) everywhere?

s/char/signed char/g

The next reaction is to just replace char with signed char in the variable declarations.  This has it's own set of problems.

Some calls are defined as taking char.  Feeding them specifically signed or unsigned chars can upset them.

However, this is the long-term solution.  Define the data correctly, and unambiguously.

I have 1304 instances to evaluate.

typedef char

The more correct way, in this case, is to typedef some structures to signed or unsigned chars.  This is indeed what should be done, and has been done, for some of the data to make it portable.

In other words, where we use char as a byte, we want to use sbyte and ubyte, and not char.  char should be used for characters.

-fsigned-char

gcc comes with a flag: -fsigned-char.

This will make the code behave as if char is signed.  It's specifically meant for this.  To make this work, you can try the following in configure.ac:
AC_C_CHAR_UNSIGNED
if test $ac_cv_c_char_unsigned = yes && test "$GCC" = yes; then
    CFLAGS+=" -fsigned-char "
fi

(void *)-1


NULL is a pointer

In C, NULL is usually a special case pointer.  It can mean the end of a string or list, or it can mean an error.

List Terminator

// list of strings
char * list[] = {
    "apple",
    "pear",
    NULL
}

Then we can use the following code to loop over the list:
char ** s;
for (s = list; *s; ++s) {
    printf("%s\n", *s);
}

Error

if (malloc(-1) == NULL) return (-1); 

Valid Pointer

NULL is also a pointer.  NULL is 0x0. It points to the very first block of RAM.  On the 8086 this would be the interrupt vector table.  On the Commodore 64 you might get the processor port data direction.

Unless you're the kernel and interested in hardware you're unlikely to care.

Today we care.

Today we want to use pointers to mark specific conditions that don't have meaning in specific C.  For example, in the list of strings above we might want to warn about an uninitialized string.

In particular, today I had to build a list recursively, and then loop over it and free() the elements.  However, it was possible to have an empty entry.

We can't have:

// list of strings
char * list[] = {
    "apple",
    "pear",
    NULL,
    "banana",
    NULL
}
we'd never get to "banana".  We'd never free it.  There would be a memory leak.

malloc() == NULL => success

First, we'll discus when NULL is valid.

Under one condition malloc() will return NULL, but it will also be a success.  If malloc() returns 0x0 -- a pointer to the beginning of RAM, it will actually return successfully, but report an error.

This can happen when we do malloc(sizeof(everything)).  Everything might be all RAM, but is more likely to be all map-able memory, which will include swap.  It will be even bigger if overcommit is enabled.

When this happens, the memory allocated must start at 0.  If it starts at 0x1, it misses the first byte, so it hasn't allocated everything.

Therefore malloc(sizeof(everything)) can only ever return 0x0 -- on both success and failure.

If 0x0 points to a hardware-specific area, the beginning of RAM might be remapped away, and malloc(sizeof(everything)) might point to the start of RAM, say 0x100.  However, if this is the case, 0x0 is still valid: it points to a hardware-specific area.

0xdeadbeef

When debugging programs it can help to initialise variables to something easily visible.  The default in C is uninitialized: whatever is in RAM.  The default when using bzero() is 0x0.

The default is 0x0 because it initializes all memory to NULL, which will terminate a string or a list in C.  It's also appeals to psychology: unused memory is empty.

When debugging it helps to initialize memory to something easily visible in a debugger.  Unfortunately there are lots of valid zeroes.  So if we initialize to 0x0, we don't know if the variable was never used, or actually is set to zero.

Debuggers also speak hex, so we want a non-zero value that's easy to spot in hex.  Hence: 0xdeadbeef, which is easy to spot and almost English.

(void *)-1

Just as 0x0 is an interesting answer, so is -1, or 0xffffffff (on 32bit machines.)

Is (void *)-1 a valid pointer?


(void *)-1 can only ever point to a one-byte block of RAM.  It can only ever point to malloc(1).  It cannot contain more than one byte.

(void *)-1 is not valid on any machine that requires memory to be word or page or int aligned.

(void *)-1 might be valid in a memory allocator that allocates top-down, but even then it's likely to skip, to allow for easy realloc().

In Use


If you need a list of arrays that could be empty, you might use:

char * list[] = {
    "apple",
    "pear",
    (void *)-1,
    "banana",
    NULL
}
Now we can check for (void *)-1 to know that the list isn't finished (not NULL), but that the contents at this index are uninitialized.

Wednesday 13 February 2013

UEFI to BIOS

Introduction

For those of you who have followed the Samsung Laptop UEFI brick bug at Matthew Garrett's blog, you may want to change the way your laptop boots from UEFI to BIOS.

If you've been Google-ing this, you'll notice a number of posts for changing BIOS to UEFI, but not the other way round.  This post is about UEFI to BIOS.


Why

UEFI is complicated.  UEFI is buggy.  The BIOS is buggy too, but at least it's not so complicated.

If you've been paying attention, Samsung has a serious UEFI bug.  It was first detected in Linux, but it also happens in Windows.  Basically you can brick your laptop with valid UEFI calls to a broken Samsung implementation of UEFI.  This bricks your laptop.  Send it back to the manufacturer.

At the moment, only the Linux samsung_laptop driver triggers the bug in the wild.  It's highly conceivable that more software will be found that triggers this bug.

Problem

I've got a Samsung laptop running Fedora 18.  It was installed with UEFI, and Fedora duly installed Grub2 with UEFI options.  I'd like to change this to a BIOS boot.

Solution

Install Grub2 for BIOS.

Requirements

Live CD

For these instructions, you do not need a live CD.  You will always be able to reboot into UEFI mode to fix any problems.

However, if you're paranoid (like me), you'll have a Live CD or USB handy.  I recommend you come prepared.

Grub2

Fedora 18 splits Grub2 into two packages: one for UEFI and one for BIOS.  We need to install the BIOS version.  These can exist side-by-side:
yum install grub2


Partitions

Grub needs space for grub stage 1.5.  Basically the Grub boot binary is too big for the traditional boot record for BIOS (the MBR)  That boot record is only 512 bytes.

For traditional partition tables, Grub just used some scratch space, and the user never had to worry.  For UEFI, the partition table is now a GPT partition table, and the user does need to worry.  You need to create a BIOS Boot Partition.  This is UEFI's backwards compatible partition.

This partition only needs to be a few KB big.  1 MB is already too big.  Depending on how Fedora was installed you might have to re-size your LVM volume.  However, Fedora installs partitions aligned to MB boundaries for performance reasons, so you can usually squeeze one in the middle.

Use parted to do this.  You want to end up with something like this:

(parted) print
Model: ATA LITEONIT LMT-256 (scsi)
Disk /dev/sda: 256GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name                  Flags
 4      17.4kB  1049kB  1031kB               4                     bios_grub
 1      1049kB  211MB   210MB   fat16        EFI System Partition  boot
 2      211MB   735MB   524MB   ext4
 3      735MB   256GB   255GB                                      lvm


Here I created partition 4, and squeezed it in the beginning.  Toggle the partition flag to bios_grub.

(parted) mkpart 4 0 1MB
(parted) toggle 4 bios_grub
(parted) quit

You do not need to format this partition.  Therefore you also do not need to mount this partition.  Grub will dump a binary to the beginning of this partition.

Bootable Partition

I didn't need to do this, and I haven't done this.  However, some older BIOS-es will require a bootable partition.

You can't use parted for this.  You must use fdisk, and mark the GPT partition as bootable.  I think it's the 'a' option in fdisk (from memory)


Config File

Grub needs a configuration file.  For BIOS this is /boot/grub2/grub.cfg  For UEFI this was /boot/efi/EFI/fedora/grub.cfg

Grub2 wants you to use grub2-mkconfig, and not manually edit a config file.  This is what Grub2 wants, and it's wrong.

"grub2-mkconfig" will create a configuration file for the current boot environment.  If you booted UEFI (like you would have), you end up with a UEFI grub.cfg, and not a BIOS grub.cfg.

So first make it: grub2-mkconfig -o /boot/grub2/grub.cfg, then edit it.

Replace all instances of linuxefi with linux
Replace all instances of initrdefi with initrd

Boot Record

Now we're ready to install Grub on the MBR:
grub2-install --target=i386-pc /dev/sda # replace sda with your disk


The --target=i386-pc tells Grub to install BIOS files; and not UEFI files.  It has nothing to do with 32bit vs. 64bit kernels.


BIOS setting

Reboot.  Go to the boot menu, and find the option for OS type.  This will be set to UEFI OS.  Change it to CMS OS.

Reboot.


Troubleshooting

Grub-install moans about --target

You need the grub2 package.  You only have grub2-efi.


Grub boots to grub>

Grub boots, but doesn't have a configuration file.  Run grub2-mkconfig -o /boot/grub2/grub.cfg  I often forget the '-o' option, so grub2-mkconfig writes the configuration to stdout.


Grub moans about linuxefi

Remember to edit your grub.cfg, and replace linuxefi and initrdefi with the plain versions.

Grub2-install moans about BIOS Boot Partition

You need to create an addition partition in your GPT partition table.  You need to mark that partition as type bios_grub.  Read the parted section again.

Wednesday 12 December 2012

NSS and getspnam()

Implementing central authentication with password expiry on systems that implement NSS is actually broken.

What is NSS?

NSS is the standard library way of interfacing to user information on Unix-like systems. That usually means reading /etc/passwd.
In other words: this is what's used to find out identification and authorisation information about users. It can also be used to configure other files in /etc/ in a central location.
It allows this information to be in local files (/etc/passwd and friends) or on the network (NIS, LDAP, AD, etc.)
Programs have a standard way to ask certain questions, using an API, and don't have to worry about where the information is coming from.
NSS complements PAM. PAM is used to authenticate users. In other words: PAM confirms who you are. PAM needs access to some of the same files as NSS, although for slightly different reasons.
  • PAM asks the question: Are you who you say you are?
  • NSS asks the questions[1]: Who are you?, and Are you allowed to do that?
On Unix-like systems, you can obtain a users' real name using the call getpwnam(), for example. This filters through NSS, and gives a users' name, and a users' group ID and some other bits of information. Things like: Where do I store this users' files?
Similarly, there is a call getspnam() which gets the /etc/shadow information through NSS to obtain some authorisation information like: is the user allowed to login today?. For example: the user might have the right password, but the password might be about to expire.

Where is the problem?

NSS isn't allowed to fail. It may not return temporary failures according to the spec. In Unix fashion, this is normally the error EAGAIN
If a local file is broken, this makes sense. When the password file is broken, it's ok for login to crash.
When NSS goes over a network, however, this is broken. Networks do die. Network packets do go missing. Remote servers do get rebooted.
Now NSS implementations on Linux take care of this with NSCD, name service caching daemon This caches previous NSS requests. It will cache the answer to Who are you?
This is done for performance reasons as much as fallible network services, but it nicely does the job.
Except it doesn't. It doesn't cache all the API calls. It doesn't cache password expiry, for one. That means that calls to getspnam() will hang indefinitely when the network is down. Remember that NSS calls may not fail.

Solutions

There are various solutions. None of them are easy.

Fix NSCD

Apparently this won't happen. I've found a couple of comments that NSCD will not cache password expiry information -- normally found in /etc/shadow -- but no confirmation from the glibc people (NSCD is part of glibc.)

Replace NSCD

Two projects are attempting to implement this.

sssd

RedHat stumbled across this problem, and they are replacing NSCD with SSSD. Great. Except it has exactly the same problem, by design.
SSSD is meant to replace NSCD because it's broken, and it's stated goal is offline logins (), but then fails on the same API calls. sssd getent shadow bug

nsscache

nsscache is from Google, and the project was started for these exact problems. It does cache /etc/shadow. However, this cache runs from the cron, and is not updated as users log in.
Password expiry normally requires the user to immediately change his or her password, or be logged out. When the password is changed, it's the central password. If nsscache doesn't run immediately, the information presented to NSS can be out of date; prompting the user to change his or her password again.
This problem can be worked around by configuring /etc/nsswitch.conf, but ideally nsscache will allow for a user-specific update on login.

Future

I'll post about configuring nsscache.

Footnotes

  • [1] NSS is an API to allow other applications to ask those questions.
  • [2] Blogger lost some links...

Friday 6 May 2011

Gnome 3

I decided to try Gnome 3 today.

Here are the problems I've found:

gpg-agent isn't loaded correctly

Gnome3's session does load gpg-agent, but it does not export the settings to gnome-terminal. Hence gnome-terminal does not use gpg-agent when you run, for example gpg --decrypt and -- if it's setup -- ssh

The fix is fairly simple: find the saved settings file (usually $HOME/.gnupg/gpg-agent-info-hostname), and export it in $HOME/.bashrc. For example:

if [ -r /home/berend.deschouwer/.gnupg/gpg-agent-hostname ]; then
 . .gnupg/gpg-agent-info-hostname
 export GPG_AGENT_INFO SSH_AUTH_SOCK SSH_AGENT_PID
fi

Upgrading requires rebooting

Upgrading can require rebooting. If a Gnome2 gnome-settings-daemon is running when it's upgraded to a Gnome3 gnome-settings-daemon, gnome-settings-daemon can use 100% CPU utilisation until you reboot.

gnome-shell does not always start

I haven't found a permanent solution for this yet. The process gnome-shell doesn't always start, or maybe crashes.

The workaround is to open a terminal, and run
export DISPLAY=:0.0
gnome-shell --replace &

It usually starts fine the second time.