I was fearing this step for a couple of months now, but it is inevitable that I make sense of the time-mess we're currently in. I know that there's a lot of information available, but I still have problems applying the concept of local time to my problem:
Basically, we have a Postgres Database where we store different Social Media Posts (Foursquare, Flickr etc.) for analysis. The API's usually return Unix timestamps, which are always UTC as far as I know.
The problem is: When we analyze this data, we don't care about UTC, we need to know the local time of the user when he/she posted on Social media. Assume, for example, someone took a picture of the Opera House in Sydney at 4PM local Sydney time (GMT+10). How do we store this in Postgres - timestamp with timezone
or timestamp without timezone
? Is the local time of the user available at all - or do we have to take the Geocoordinates into account (if available) to calculate local time of the user?
Summarized:
- we don't care about the time the analyst/investigator is in when viewing the data in Postgres
- we don't care about the offset to UTC/GMT, it would be nice to have it (to know the general location), but it is not really important
- we only care about the subjective, personal time the user perceived when posting/taking pictures
What would be the best approach to store metadata given these circumstances?
[Edit] I made a query to see where users mention "o'clock" in their posts, to compare this to the post_publish_date (timestamp without timezone), as returned from the API). Surprisingly, What I see suggests that all these timestamps are local time, not UTC:
"post_publish_date","post_body"
"2016-12-06 07:27:07","[...] at 8 o'clock a.m. [...]"
"2018-02-22 05:21:53","[...] main 6 o'clock road to [...]"
"2018-01-27 06:13:04","[...] get up early otherwise you miss [...] 6 o'clock [...]"
"2018-02-09 16:21:37","It's Friday [...] its gotta be 5 o'clock [...]"
"2018-02-02 15:44:21","It's Friday, [...] it's always 5 o'clock [...]"
"2015-11-21 02:37:53","[...] until 4 o'clock in the morning. [...]"
"2017-09-15 07:51:53","[...] 9 o'clock at night[...]"
"2017-12-18 19:52:40","[...]Date: ♨18.12.2017 [...] 20o'clock [...] Location: New York[...]"
This is a great post that explains the differences, but it is said that a Unix timestamp can be in any timezone - UTC or local time and one never knows unless the timezone is given. Now the question is: can I store posts with timestamps without timezone in the same column (formatted as timestamp with timezone
) as those posts that have a timezone specified?
Here is another post that described how to retrieve ocal time for tweets from the user's respective coordinates when tweeting. Therefore, the time that at leats Twitter returns is UTC time, not local time.