0

i am trying to compute time intervals per day from a list of unix timestamps in Python. I have searched for simular questions on stack overflow but mostly found examples of computing deltas or SQL solutions.

I have a list of the sort:

timestamps = [1176239419.0, 1176334733.0, 1176445137.0, 1177619954.0, 1177620812.0, 1177621082.0, 1177838576.0, 1178349385.0, 1178401697.0, 1178437886.0, 1178926650.0, 1178982127.0, 1179130340.0, 1179263733.0, 1179264930.0, 1179574273.0, 1179671730.0, 1180549056.0, 1180763342.0, 1181386289.0, 1181990860.0, 1182979573.0, 1183326862.0]

I can easily turn this list of timestamps into datetime objects using:

[dt.datetime.fromtimestamp(int(i)) for i in timestamps]

From there I can probably write quite a lengthy code where the first day/month is kept and a check is done to see if the next item in the list is of the same day/month. If it is I look at the times, get the first and last from the day and store the interval + day/month in a dictionary.

As I am fairly new to Python I was wondering what is the best way to do this in this programming language without the abusive use of if/else statements.

Thank you in advance

Georgi Nikolov
  • 113
  • 3
  • 11

3 Answers3

1

You can use collections.defaultdict. It's amazingly handy when you're trying to build a collection without inital estimates on size and members.

from collections import defaultdict

# Initialize default dict by the type list
# Accessing a member that doesn't exist introduces that entry with the deafult value for that type
# Here, when accessing a non-existant member adds an empty list to the collection
intervalsByDate = defaultdict(list)

for t in timestamps:
    dt = dt.datetime.fromtimestamp(t)
    myDateKey = (dt.day, dt.month, dt.year)
    # If the key doesn't exist, a new empty list is added
    intervalsByDate[myDateKey].append(t)

From this, intervalsByDate is now a dict with values as a list timestamps sorted based on the calendar dates. For each date you can sort the timestamps and get the total intervals. Iterating the defaultdict is identical to a dict (being a sub-class of dicts).

output = {}
for date, timestamps in intervalsByDate.iteritems():
    sortedIntervals = sorted(timestamps)
    output[date] = sortedIntervals[-1] - sortedIntervals[0]

Now output is a map of dates with intervals in miliseconds as the value. Do with it as you will!


EDIT

Is it normal that the keys are not ordered? I have keys from different months mixed togheter.

Yes, because (hash)maps & dicts are essentially unordered

How would I be able to, for example, select the first 24 days from a month and then the last

If I was very adamant on my answer, I'd maybe look at this, which is an Ordered default dict.. However, you could modify the datatype of output to something which isn't a dict to fit your needs. For example a list and order it based on dates.

Community
  • 1
  • 1
Tejas Pendse
  • 551
  • 6
  • 19
  • Pretty neat answer thank you for the fast response. Thou I have couple of remarks: 1) Is it normal that the keys are not ordered? I have keys from different months mixed togheter. 2) How would I be able to, for example, select the first 24 days from a month and then the last week from a month so I could compare the intervals? – Georgi Nikolov Feb 19 '15 at 13:18
0

Just subtract the 2 dates from each other. This will result in a timedelta instance. See datetime.timedelta: https://docs.python.org/2/library/datetime.html#timedelta-objects

from datetime import datetime
delta = datetime.today() - datetime(year=2015, month=01, day=01)
#Actual printed out values may change depending o when you execute this :-)
print delta.days, delta.seconds, delta.microseconds #prints 49 50817 381000 
print delta.total_seconds() #prints 4284417.381 which is 49*24*3600 + 50817 + 381000/1000000

Combine this with row slicing and zip to get your solution. An example solution would be:

timestamps = [1176239419.0, 1176334733.0, 1176445137.0, 1177619954.0, 1177620812.0, 1177621082.0, 1177838576.0, 1178349385.0, 1178401697.0, 1178437886.0, 1178926650.0, 1178982127.0, 1179130340.0, 1179263733.0, 1179264930.0, 1179574273.0, 1179671730.0, 1180549056.0, 1180763342.0, 1181386289.0, 1181990860.0, 1182979573.0, 1183326862.0]
timestamps_as_dates = [datetime.fromtimestamp(int(i)) for i in timestamps]
# Make couples of each timestamp with the next one
# timestamps_as_dates[:-1] -> all your timestamps but the last one
# timestamps_as_dates[1:]  -> all your timestamps but the first one
# zip them together so that first and second are one couple, then second and third, ...
intervals = zip(timestamps_as_dates[:-1],timestamps_as_dates[1:])
interval_timedeltas = [(interval[1]-interval[0]).total_seconds() for interval in intervals]
# result = [95314.0, 110404.0, 1174817.0, 858.0, 270.0, 217494.0, 510809.0, 52312.0, 36189.0, 488764.0, 55477.0, 148213.0, 133393.0, 1197.0, 309343.0, 97457.0, 877326.0, 214286.0, 622947.0, 604571.0, 988713.0, 347289.0]

This also works for adding or subtracting a certain period from a date:

from datetime import datetime, timedelta
tomorrow = datetime.today() + timedelta(days=1)

I don't have an easy solution for adding or subtracting months or years.

HSquirrel
  • 839
  • 4
  • 16
  • Not bad but the problem is that I don't really need to work with time deltas. To be able to further do research with the times later I need to construct a dictionary of the sort: [ "day/month":[starting time, end time], ....] So then I can compare one day interval to the other to see if there are outliers or drastic changes in intervals between some days. It can probably work with deltas too but then I will need to do extra computation to get the exact time from the resulting delta which seems like a bit of a hassle. Thank you for the suggestion thou, ll definitely experiment further – Georgi Nikolov Feb 19 '15 at 13:35
  • Okay, I think I might have misunderstood your question then. With your comments and after rereading your question: suppose you have 4 timestamps from day 1 and 3 timestamps from day, you'd want to calculate all the possible intervals within day 1 (10 intervals total) and all the possible intervals in day 2 (6 intervals in total). Is this correct? – HSquirrel Feb 19 '15 at 13:43
  • actually its more of the genre if i have 4 timestamps from day 1 i would search for the interval [first timestamp, last timestamp]. The idea is that I can analyse (for example) when a User uses his pc per day and then I can compare two days toghter or compare the last week of a month to the rest of the month. To have that I first need to be able to sort through the timestamps and be able to get the worked-in interval per day. I don't need all intervals per day, just one interval per day which shows the first and last hour:minute when the User used the pc (for example) – Georgi Nikolov Feb 19 '15 at 14:02
  • Okay, I get it! Thanks for the clarification (and for taking the time to explain). In that case I totally got it wrong and I fear I don't have anything to add to the answers already given (I'd do it pretty much the same). I'll give it a few hours to give you time to read this and then I'll delete this answer as it doesn't answer the question. – HSquirrel Feb 19 '15 at 14:06
0

If the list sorted as in your case then you could use itertools.groupby() to group the timestamps into days:

#!/usr/bin/env python
from datetime import date, timedelta
from itertools import groupby

epoch = date(1970, 1, 1)

result = {}
assert timestamps == sorted(timestamps)
for day, group in groupby(timestamps, key=lambda ts: ts // 86400):
    # store the interval + day/month in a dictionary.
    same_day = list(group)
    assert max(same_day) == same_day[-1] and min(same_day) == same_day[0]
    result[epoch + timedelta(day)] = same_day[0], same_day[-1] 
print(result)

Output

{datetime.date(2007, 4, 10): (1176239419.0, 1176239419.0),
 datetime.date(2007, 4, 11): (1176334733.0, 1176334733.0),
 datetime.date(2007, 4, 13): (1176445137.0, 1176445137.0),
 datetime.date(2007, 4, 26): (1177619954.0, 1177621082.0),
 datetime.date(2007, 4, 29): (1177838576.0, 1177838576.0),
 datetime.date(2007, 5, 5): (1178349385.0, 1178401697.0),
 datetime.date(2007, 5, 6): (1178437886.0, 1178437886.0),
 datetime.date(2007, 5, 11): (1178926650.0, 1178926650.0),
 datetime.date(2007, 5, 12): (1178982127.0, 1178982127.0),
 datetime.date(2007, 5, 14): (1179130340.0, 1179130340.0),
 datetime.date(2007, 5, 15): (1179263733.0, 1179264930.0),
 datetime.date(2007, 5, 19): (1179574273.0, 1179574273.0),
 datetime.date(2007, 5, 20): (1179671730.0, 1179671730.0),
 datetime.date(2007, 5, 30): (1180549056.0, 1180549056.0),
 datetime.date(2007, 6, 2): (1180763342.0, 1180763342.0),
 datetime.date(2007, 6, 9): (1181386289.0, 1181386289.0),
 datetime.date(2007, 6, 16): (1181990860.0, 1181990860.0),
 datetime.date(2007, 6, 27): (1182979573.0, 1182979573.0),
 datetime.date(2007, 7, 1): (1183326862.0, 1183326862.0)}

If there is only one timestamp in that day than it is repeated twice.

how would you afterwards do to test if the last (for example) 5 entries in the result have a larger interval than the previous 14?

entries = sorted(result.items())
intervals = [(end - start) for _, (start, end) in entries]
print(max(intervals[-5:]) > max(intervals[-5-14:-5]))
# -> False
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Nice example. Thou I prefer to have a interval [first timestamp, last timestamp] instead of the deltas but that is easy to change by changing {same_day[-1] - same_day[0]} with a list of the two values. On a different note, how would you afterwards do to test if the last (for example) 5 entries in the result have a larger interval than the previous 14? – Georgi Nikolov Feb 19 '15 at 14:07
  • What do you want to get if there is only one timestamp in that day: `same_day[0]` is `same_day[-1]`? What is "entry"? `max(entries[-5:], key=get_interval)` > `max(entries[-5-14:-5], key=get_interval)`. – jfs Feb 19 '15 at 14:16
  • Also I just noticed that the data inside the result dictionary is not sorted per date, I have data from the 6th month in between data from the 4th month. It's quite strange as I have the same problem with @mogambo answer :X – Georgi Nikolov Feb 19 '15 at 14:17
  • I haven't really thought on how to handle days with only one timestamp, good point. I will probably just make a tuple of two identical values or add one second to the timestamp and use [timestamp, timestamp + 1sec]. By "entry" i mean one day in the dictionary, so here would be comparing the last 5 days to the other 14 days and see if the interval is bigger. I am going to try your proposed max(...) solution. Thanks for all the help :D – Georgi Nikolov Feb 19 '15 at 14:22
  • @GeorgiNikolov: dictionaries are unordered in Python (the output `dict` appears to be sorted because `print` is `pprint.pprint` in my REPL). You could use `collections.OrdertedDict` or sort the keys explicitly when it is necessary. – jfs Feb 19 '15 at 14:23
  • @Sebastian: yeah I actually figured that out :D. I just use the sorted() function for now. If it doesn't work well I will swich to OrderedDict. Thanks for all the help, mate – Georgi Nikolov Feb 19 '15 at 14:30
  • @GeorgiNikolov: I've implemented `max(entries[-5:], key=get_interval) > max(entries[-5-14:-5], key=get_interval)` (note: the pseudo-code in the comment is incorrect because it compares two entries at the end). The code in the answer fixes it. If you set `result = collections.OrderedDict()` instead of `result = {}` then you don't need to call `sorted()`. – jfs Feb 19 '15 at 14:43