1

I use Spyder's profiler to run a python script, which handles 700000 lines of data, and the time.strptime function takes more than 60s(the built-in function sort only takes 11s).

How should I improve its efficiency? Is there any effective module for time manipulation?

The core code snippet is here:

data = []
fr = open('big_data_out.txt')
for line in fr.readlines():
    curLine = line.strip().split(',')
    curLine[2] = time.strptime( curLine[2], '%Y-%m-%d-%H:%M:%S')
    curLine[5] = time.strptime( curLine[5], '%Y-%m-%d-%H:%M:%S')
#    print curLine
    data.append(curLine)

data.sort(key = lambda l:( l[2], l[5], l[7]) )
#print data

result = []
for itm in data:
    if itm[2] >= start_time and itm[5] <= end_time and itm[1] == cameraID1 and itm[4] == cameraID2:
        result.append(itm)
kigawas
  • 1,153
  • 14
  • 27
  • Are there many similar times? Or are most of the times unique? – lsowen Apr 09 '15 at 17:01
  • Are you interested in `data`, or just `result`? You might be able to skip some of the calls to `strptime()` if you move the if statements inside your `for line` loop and skip lines that don’t match the camera ID, or where you find an out-of-bounds date in the first data. That’s probably more memory efficient as well. – alexwlchan Apr 09 '15 at 17:13
  • You may also want to look at whether `datetime.datetime.strptime()` is any better. I believe it does something very similar, but it might have a performance edge. I don’t know. – alexwlchan Apr 09 '15 at 17:15
  • There is no need to use `.readlines()`. You are building a list of 700000 lines for no reason. You should also use `with` to open your files or at least close them. You can also use the csv module which will create the rows for you splitting on `,`. – Padraic Cunningham Apr 09 '15 at 17:49
  • I just checked datetime.strptime(), and performance is basically the same. – user3757614 Apr 09 '15 at 17:56
  • @alexwlchan I just advance the if statement to the first loop, and it did great work to improve the performance. Thanks for your ardent advice. – kigawas Apr 10 '15 at 00:20
  • @PadraicCunningham Thanks for your advice, I merge the two loops into one and the time cost has been reduced to less than 2s. – kigawas Apr 10 '15 at 00:23
  • If you were able to get it faster, post your improved code as an answer, with some comments about what you changed – it will help other people who come across this question. (Self-answering is totally okay here, and encouraged.) – alexwlchan Apr 10 '15 at 05:54

1 Answers1

0

From the answer given here: A faster strptime?

>>> timeit.timeit("time.strptime(\"2015-02-04 04:05:12\", \"%Y-%m-%d %H:%M:%S\")", setup="import time")
17.206257617290248
>>> timeit.timeit("datetime.datetime(*map(int, \"2015-02-04 04:05:12\".replace(\":\", \"-\").replace(\" \", \"-\").split(\"-\")))", setup="import datetime")
4.687687893159023
Community
  • 1
  • 1
user3757614
  • 1,776
  • 12
  • 10
  • 1
    Well, I think the key point is that you use the `map` function instead of a string. I think `strptime` function cost lots of time while handling the string. – kigawas Apr 10 '15 at 00:25