22

I am trying to find the max length of a timezone identifier. This is the string used as the timezone name (e.g. "America/New_York"). The tz database is not helpful; I could not find implementation details.

Microsoft (.NET Framework 4.5) suggests a max length of 32, but this seems to be a limitation of their registry.

libc points to a limit called "_POSIX_TZNAME_MAX", which is 3 characters long, but this is an absolute minimum requirement for POSIX compliance. Typically, I guess an implementation will use more.

So the real question is: What's an acceptable string length to safely store timezone "tzname" / identifier name?

sleblanc
  • 3,821
  • 1
  • 34
  • 42
  • The Microsoft .NET Framework `TimeZone.Id` property is for Windows' time-zone identifiers which are very different from IANA tzdb identifiers. – Dai Jan 27 '20 at 06:44

1 Answers1

30

Why not use a container that doesn't care what the length is -- as e.g. std::string ?

Now, it so happens that I was working recently with the TZ db as supplied in the common csv format (eg here in a file from Boost) but the same format is also used in the Boost sources.

With that data, I am seeing a maximum length of 28:

R> library(RcppBDT)                      # R package interfacing Boost Date_Time
Loading required package: Rcpp
R> tz <- new(bdtTz, "America/Chicago")   # init. an object, using my default TZ
R> tznames <- tz$getAllRegions()         # retrieve list of all TZ names
R>
R> length(tznames)                       # total number of TZ identifiers
[1] 381
R>
R> head(tznames)                         # look at first six
[1] "Africa/Abidjan"     "Africa/Accra"       "Africa/Addis_Ababa" 
[4] "Africa/Algiers"     "Africa/Asmera"      "Africa/Bamako"     
R>
R> summary(sapply(tznames, nchar))       # numerical summary of length of each
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      9      13      15      15      17      28 
R>
R> tznames[ nchar(tznames) >= 26 ]       # looking at length 26 and above
[1] "America/Indiana/Indianapolis" "America/Kentucky/Louisville"  
[3] "America/Kentucky/Monticello"  "America/North_Dakota/Center" 
R> 

We can also look at a histogram:

R> library(MASS)
R> truehist(sapply(tznames, nchar), 
+           main="Distribution of TZ identifier length", col="darkgrey")
R>

enter image description here

This uses code which I have in my RcppBDT package's SVN repo on R-Forge but not yet in the CRAN version of the package.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • 4
    Amazing answer! Actually, I cannot use variable length strings because I want to store those strings in a database. At first, I did not want to use an arbitrarily-long char field, but I think I am set on a 40-char long string[1]. I will put these in a separate table and refer to them using foreign keys. 1: There is a timezone ("America/Argentina/ComodRivadavia") that is actually 32-char long. In the future there might be other equally-long names. – sleblanc Sep 22 '12 at 19:30
  • Can't you use varchar in SQL? – Dirk Eddelbuettel Sep 22 '12 at 19:32
  • 3
    Don't varchars actually require a specified length? – sleblanc Sep 22 '12 at 19:39
  • 1
    See http://stackoverflow.com/questions/59667/why-would-i-ever-pick-char-over-varchar-in-sql -- you want varchar just as I said right at the beginning :) – Dirk Eddelbuettel Sep 22 '12 at 19:49
  • You're absolutely right, thank you. Do you think a VARCHAR size 40 leaves enough space for future timezone names? – sleblanc Sep 22 '12 at 20:34
  • I'd say so. I am not sure whether varchar(N) (for some value of N) really is preferable to just varchar. – Dirk Eddelbuettel Sep 22 '12 at 21:04
  • 4
    The longest name I can find in my local zoneinfo directory is `right/America/Argentina/ComodRivadavia` (38 chars). Besides all `right/...` there are also `posix/...` names. Those names are explained [here](http://askubuntu.com/a/34957/6969). – Lekensteyn Oct 31 '15 at 21:05
  • 1
    Is just VARCHAR even valid SQL? I thought a size was mandatory. If it isn't, maybe that's database-specific, in which case that's a good reason to specify the number - if you specify a value too high for the database you're using, you will find out now, instead of when someone tries to insert it. :) – Hakanai Dec 27 '17 at 03:09
  • varchar by itself in some databases, like SQL Server, implies the varchar(max) or a max length varchar (not good). I just do a healthy round up to an even, recognizable, power of 2 in a bid to be safe and at least have the semblance of reason. I'm going with 64 in this case. – Anonymike Jun 11 '23 at 15:39