How to improve the efficiency of time.strptime in Python?

Question

I use Spyder's profiler to run a python script, which handles 700000 lines of data, and the time.strptime function takes more than 60s(the built-in function sort only takes 11s).

How should I improve its efficiency? Is there any effective module for time manipulation?

The core code snippet is here:

data = []
fr = open('big_data_out.txt')
for line in fr.readlines():
    curLine = line.strip().split(',')
    curLine[2] = time.strptime( curLine[2], '%Y-%m-%d-%H:%M:%S')
    curLine[5] = time.strptime( curLine[5], '%Y-%m-%d-%H:%M:%S')
#    print curLine
    data.append(curLine)

data.sort(key = lambda l:( l[2], l[5], l[7]) )
#print data

result = []
for itm in data:
    if itm[2] >= start_time and itm[5] <= end_time and itm[1] == cameraID1 and itm[4] == cameraID2:
        result.append(itm)

Are there many similar times? Or are most of the times unique? — lsowen, Apr 09 '15 at 17:01
Are you interested in `data`, or just `result`? You might be able to skip some of the calls to `strptime()` if you move the if statements inside your `for line` loop and skip lines that don’t match the camera ID, or where you find an out-of-bounds date in the first data. That’s probably more memory efficient as well. — alexwlchan, Apr 09 '15 at 17:13
You may also want to look at whether `datetime.datetime.strptime()` is any better. I believe it does something very similar, but it might have a performance edge. I don’t know. — alexwlchan, Apr 09 '15 at 17:15
There is no need to use `.readlines()`. You are building a list of 700000 lines for no reason. You should also use `with` to open your files or at least close them. You can also use the csv module which will create the rows for you splitting on `,`. — Padraic Cunningham, Apr 09 '15 at 17:49
I just checked datetime.strptime(), and performance is basically the same. — user3757614, Apr 09 '15 at 17:56
@alexwlchan I just advance the if statement to the first loop, and it did great work to improve the performance. Thanks for your ardent advice. — kigawas, Apr 10 '15 at 00:20
@PadraicCunningham Thanks for your advice, I merge the two loops into one and the time cost has been reduced to less than 2s. — kigawas, Apr 10 '15 at 00:23
If you were able to get it faster, post your improved code as an answer, with some comments about what you changed – it will help other people who come across this question. (Self-answering is totally okay here, and encouraged.) — alexwlchan, Apr 10 '15 at 05:54

score 0 · Accepted Answer · edited May 23 '17 at 12:23

0

From the answer given here: A faster strptime?

>>> timeit.timeit("time.strptime(\"2015-02-04 04:05:12\", \"%Y-%m-%d %H:%M:%S\")", setup="import time")
17.206257617290248
>>> timeit.timeit("datetime.datetime(*map(int, \"2015-02-04 04:05:12\".replace(\":\", \"-\").replace(\" \", \"-\").split(\"-\")))", setup="import datetime")
4.687687893159023

edited May 23 '17 at 12:23

Community

1
1

answered Apr 09 '15 at 18:02

user3757614

1,776
12
10

1

Well, I think the key point is that you use the `map` function instead of a string. I think `strptime` function cost lots of time while handling the string. – kigawas Apr 10 '15 at 00:25

How to improve the efficiency of time.strptime in Python?

1 Answers1