Wednesday, 13 May 2015

cat unknown

cat unknown

Postgresql can store timestamps.  It prefers to store timestamps with timezone information.  You can store a timestamp, date, time or interval.

Postgresql also lets you use strings when inserting or selecting the date, which you can then cast.  For example:
select 'Mon May 11 11:21:31 SAST 2015'::date;
This works, but only for some timezones:
select 'Mon May 11 11:21:31 CAT 2015'::date; ERROR: invalid input syntax for type date: "Mon May 11 11:21:31 CAT 2015" LINE 1: select 'Mon May 11 11:21:31 CAT 2015'::date ;
CAT is unknown.

SAST is South African Standard Time.  CAT is Central African Time.

africa behind

My first reaction is that the African locales are left behind again.  I've encountered this before.  For example: Botswana's Pula currency, or Zambia, in Java 1.4 and Java 1.6.

It turns out that this isn't the reason.  It's just part of the reason.

cat isn't always a cat

Postgresql uses two tables for timezone lookups.  pg_timezone_names and pg_timezone_abbrevs.

pg_timezone_names works with more complete timezones, eg. Asia/Ho_Chi_Minh. pg_timezone_abbrevs works with the abbreviations.  However, pg_timezone_abbrevs must do forward and reverse lookups, and it turns out that timezone names aren't unique.

For example, at Wikipedia you'll find that ACT can mean Acre Time, or ASEAN Common Time.  It's simply not possible for Postgresql to know what all timezones mean.

As a note: there is only one CAT in that table.

problems

os, java and postgresql

It's frustrating that the OS, Java and Postgresql do not share locale information.  It would be nice if all three services actually provided exactly the same information, but they don't.

Zambia's currency changed in January 2014.  Was the OS updated?  Java?  Postgresql?  At the same time?

testing

Testing for these strings is hard.  Do you know if all your applications work with the input?

africa really is left behind

There is only one CAT, yet Postgresql does not add it, and will not add it (I filed a bug report, #13267)  Yet Postgresql does add CST which has five different meanings.  And it adds the North American -- and only the North American -- definition.

solutions

early input sanitation

Input should be sanitized.  That string should never have reached the database.  Input should be sanitized and converted away from ambiguous strings as early as possible.  This is just good programming practice.

late output representation

The output should be converted to human readable strings as late as possible.  Again, this is good programming practice.

packaging

There is a very real packaging problem here.  OS, Postgresql and Java definitely should have the same information.  There are likely other packages with their own information.

unconfirmed

I haven't checked which Linux (or Windows, or MacOS X) distributions have different information for these services.  I have checked that some (RedHat Enterprise 5 and 6, Postgresql 9.2 and 9.3, Java 1.4, 1.6 and 1.7) have different information for some of the values.