1

I need to make a histogram of events over a period of time. My dataset gives me the time of each event in the format ex. 2013-09-03 17:34:04, how do I convert this into something I'm able to plot in a histogram i Python? I know how to do it the other way around with the datetime and time commands.

By the way my dataset contains above 1.500.000 datapoint, so please only solutions that can be automated by loops or something like that ;)

Lauge
  • 13
  • 2

2 Answers2

0

Use time.strptime() to convert the local time string to a time.struct_time and then time.mktime(), which will convert the time.struct_time to the number of seconds since 1970-01-01 00:00:00, UTC.

#! /usr/bin/env python

import time

def timestr_to_secs(timestr):
    fmt = '%Y-%m-%d %H:%M:%S'
    time_struct = time.strptime(timestr, fmt)
    secs = time.mktime(time_struct)
    return int(secs)

timestrs = [
    '2013-09-03 17:34:04',
    '2013-09-03 17:34:05',
    '2013-09-03 17:35:04',
    '1970-01-01 00:00:00'
]

for ts in timestrs:
    print ts,timestr_to_secs(ts)

I'm in timezone +10, and the output the above code gives me is:

2013-09-03 17:34:04 1378193644
2013-09-03 17:34:05 1378193645
2013-09-03 17:35:04 1378193704
1970-01-01 00:00:00 -36000

Of course, for histogram-making purpose you may wish to subtract a convenient base time from these numbers.


Here's a better version, inspired by a comment by J. F. Sebastian.

#! /usr/bin/env python

import time
import calendar

def timestr_to_secs(timestr):
    fmt = '%Y-%m-%d %H:%M:%S'
    time_struct = time.strptime(timestr, fmt)
    secs = calendar.timegm(time_struct)
    return secs

timestrs = [
    '2013-09-03 17:34:04',
    '2013-09-03 17:34:05',
    '2013-09-03 17:35:04',
    '1970-01-01 00:00:00'
]

for ts in timestrs:
    print ts,timestr_to_secs(ts)

output

2013-09-03 17:34:04 1378229644
2013-09-03 17:34:05 1378229645
2013-09-03 17:35:04 1378229704
1970-01-01 00:00:00 0

Whenever I think about the problems that can arise from using localtime() I'm reminded of this classic example that happened to a friend of mine many years ago.

A programmer who was a regular contributor to the FidoNet C_ECHO had written process control code for a brewery. Unfortunately, his code used localtime() instead of gmtime(), which had unintended consequences when the brewery computer automatically adjusted its clock at the end of daylight saving. On that morning, localtime 2:00 AM happened twice. So his program repeated the process that it had already performed the first time 2:00 AM rolled around, which was to initiate the filling of a rather large vat with beer ingredients. As you can imagine, the brewery floor was a mess. :)

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • `mktime()` may fail for ambiguous times (during end-of-DST transitions) and for past dates if UTC offset for the local timezone was different at the time and a historical timezone database is not used (Windows). – jfs Oct 13 '14 at 00:54
  • Very good point, @J.F.Sebastian. When I wrote the above code I couldn't find an inverse to `gmtime()` in the time module docs, but on closer reading I've just discovered that there's one stashed away in the calendar module, of all places. – PM 2Ring Oct 13 '14 at 04:36
  • It is not what I meant. `timegm()` expects UTC time. The input time might be in the local timezone. – jfs Oct 13 '14 at 04:53
  • @J.F.Sebastian : Sure. I'm assuming that the time strings in Lauge's dataset have a constant timezone offset, preferably UTC times. True, my conversion routine _could_ incorporate a timezone offset and a DST flag, but I figured there was no point, since the dataset doesn't supply that information, Lauge doesn't care about the exact epoch of the values returned by timestr_to_secs, they just want scalars that can be used to make a histogram. But hopefully, those time strings are UTC times, since using local time for this kind of thing leads to the kinds of problems we're now discussing. – PM 2Ring Oct 13 '14 at 05:15
  • @J.F.Sebastian : But if you have any concrete suggestions (apart from what you've put in your panda-based answer), I'm more than happy to hear them. – PM 2Ring Oct 13 '14 at 05:29
  • yes. In the ideal world, OP would use UTC time. But if out world is not perfect then here' how to [parse an increasing sequence of local times](http://stackoverflow.com/a/26221183/4279). I expect `pandas` does something similar. – jfs Oct 13 '14 at 05:34
0

To handle time series with millions of points, you could try pandas:

#!/usr/bin/env python
from io import StringIO
import matplotlib.pyplot as plt # $ pip install matplotlib
import pandas as pd 

csv_file = StringIO(u"""time,A,B
2013-09-03 17:34:04,1,2
2013-09-03 17:34:05,3,4
2013-09-03 17:34:10,4,5
""")
df = pd.read_csv(csv_file, parse_dates=True, index_col='time')
df = df.cumsum()
df.plot()
plt.show()
jfs
  • 399,953
  • 195
  • 994
  • 1,670