0

MediaWiki (the free software behind Wikipedia) stores database timestamps in a unique binary(14) format for fields of the database. This is described further in their timestamp documentation.

The format of timestamps used in MediaWiki URLs and in some of the MediaWiki database fields is yyyymmddhhmmss. For example, the timestamp for 2023-01-20 17:12:22 (UTC) is 20230120171222. The timezone for these timestamps is UTC.

I have also seen a similar timestamp format in other places such as URLs for the Internet Archive. I am regularly needing to compare these timestamps against timestamps which are stored in a standard Unix timestamp format (seconds from the Unix epoch). I believe this should be a common format so it surprises me that I can't find a ready-made solution to easily convert from the MediaWiki format to a Unix timestamp.

What I'm most interested in is the best way to do this conversion. That is:

  • Relatively short/simple to understand code.
  • Most efficient algorithm.
  • Does detect errors in original format.

There is apparently a function that MediaWiki includes for conversion named "wfTimestamp" however I haven't been able to locate this function itself or the source code online and I understand it has a large number of unnecessary features beyond the simple conversion. One potential solution may be to remove other parts of that function, but I still don't know if that function is the optimal solution or if there's a better way. There are lots of questions on the more general conversion to timestamps but I'm hoping for something specific to this format. I've thought of a lot of ways to solve it such as a regular expression, mktime after string split, strtotime, etc... but I'm not sure which will be fastest for this particular task/time format if it had to be done a lot of times. I am assuming since this format exists in at least two places, an optimal solution for this specific format conversion could be useful for others as well. Thanks.

azoundria
  • 940
  • 1
  • 8
  • 24

3 Answers3

2

I think this is what you're probably looking:

$timestamp = strtotime("20230120171222"); 
// 1674234742

The Unix timestamp that this function returns does not contain information about time zones. In order to do calculations with date/time information, you should use the more capable DateTimeImmutable.

Please see here: https://www.php.net/manual/en/function.strtotime.php

  • 1
    Thanks. It does look like it works and I just never imagined that strtotime would automatically work with this format. Note that your comment about the Unix timestamp is true about Unix timestamps in general. They always refer to a specific point in time. (One reason I like working with them. I only have to deal with timezone/daylight savings time complexity when displaying the information to the user.) This is certainly the easiest to understand implementation though still not sure if this is the most efficient implementation. – azoundria Feb 03 '23 at 19:44
  • 2
    I have now done metric testing against your solution and DateTime::createFromFormat. This showed that strtotime was consistently faster in fact. Test 1: strtotime took 0.01284s for 10,000 conversions. DateTime::createFromFormat took 0.02822s for 10,000 conversions. Test 2: strtotime took 0.064132s for 10,000 conversions. DateTime::createFromFormat took 0.113503s for 10,000 conversions. Test 3: strtotime took 0.014117s for 10,000 conversions. DateTime::createFromFormat took 0.051584s for 10,000 conversions. I tried swapping the order of tests and results were consistent. – azoundria Feb 03 '23 at 19:54
  • Note that I found the solution fails if date_default_timezone_set has been called prior with a timezone other than UTC. I didn't find that strtotime will accept a timezone parameter, so I added a date_default_timezone_set for UTC prior and restore via date_default_timezone_get. In my testing, this didn't add anything significant to the runtime of the function. – azoundria Feb 03 '23 at 21:06
  • 1
    It is better to give the correct time zone right away, as in Andrey Makarov's solution or mine. – jspit Feb 04 '23 at 17:06
  • Thanks jspit. I have done metric testing on your solution and it is also faster in addition to being shorter so I have selected to use it instead. Time Passed: 0.1924s for 100,000 operations with using date_default_timezone_get and date_default_timezone_set, and 0.1801s for 100,000 operations with the " UTC" concatenated. By comparison, Andrey's solution takes 0.4825s for 100,000 operations. I continue to be impressed by the versatility of the strtotime function. – azoundria Feb 24 '23 at 20:34
1

You can use DateTime::createFromFormat function with specified format.

$date = DateTime::createFromFormat("YmdHis", "20230120171222", new \DateTimeZone('UTC'));
$timestamp = $date->getTimestamp();

I'm not sure that you can find more optimised way, because even if you will parse this manually, you have to consider that there are leap years and not every day has exactly 24 hours. PHP does it for you.

Michael M.
  • 10,486
  • 9
  • 18
  • 34
  • 1
    Thanks so much! I have now done metric testing against your solution and strtotime. This showed that strtotime was consistently faster in fact. Test 1: strtotime took 0.01284s for 10,000 conversions. DateTime::createFromFormat took 0.02822s for 10,000 conversions. Test 2: strtotime took 0.064132s for 10,000 conversions. DateTime::createFromFormat took 0.113503s for 10,000 conversions. Test 3: strtotime took 0.014117s for 10,000 conversions. DateTime::createFromFormat took 0.051584s for 10,000 conversions. I tried swapping the order of tests and results were consistent. – azoundria Feb 03 '23 at 19:55
1

In order to interpret the string "20230120171222" as UTC time, the time zone must be specified with strtotime or the default time zone must be set to UTC.

$dateStr = "20230120171222"; 
$timestamp = strtotime($dateStr.' UTC');
var_dump($timestamp); //int(1674234742)

See this example for comparison.

jspit
  • 7,276
  • 1
  • 9
  • 17