2

For my script I need to use Python 2.6 with only the standard library. I am trying to write a script that walks a logs directory that has a condition defined that only matches the logs that have the appropriate timestamp. The timestamp I am using is derived from the filename. I do not want to use the OS timestamps because there are times the files get copied off to a different directory to prevent them from being overwritten and this changes the file modified time.

A new file gets created every 200MB. The timestamp on the filename is the time the file was created and represents the oldest log entry in the file.

import datetime

# One event might span multiple log files.
call_start = datetime.datetime(2018, 5, 15, 5, 25, 9)
call_stop = datetime.datetime(2018, 5, 15, 5, 37, 38)

# Timestamp values of file generated from file's naming convention
t1 = datetime.datetime(2018, 5, 15, 4, 48, 16)
t2 = datetime.datetime(2018, 5, 15, 5, 3, 53)
t3 = datetime.datetime(2018, 5, 15, 5, 19, 14)
t4 = datetime.datetime(2018, 5, 15, 5, 35)
t5 = datetime.datetime(2018, 5, 15, 5, 49, 19)

file_times = [t1, t2, t3, t4, t5]

matching_times = []
for ftime in file_times:
    # Logic I can't figure out
    if scratches_head:
        matching_times.append(ftime)

# I would expect the matching_times list to contain t3 and t4

Edit

Clarification from the comments:

t3 is a file that was created at 5:19:14am. The call_start is the first entry I would see in the log. It begins at 5:25:09am. Since t4 didn't get created until 5:35:00am, the call_start has to be in t3. The call_stop is the last log entry I want to find. I would be in t4 because t5 was created at 5:49:19am.

pault
  • 41,343
  • 15
  • 107
  • 149
Jeff A
  • 73
  • 8
  • Why would t3 and t4 match? What are you trying to match on? – JHS May 17 '18 at 20:56
  • 1
    Sorry I don't understand the logic. Do you want to find timestamps that happen in between `call_start` and `call_stop`? If so, I don't understand why `t3` here qualifies. See: [How to determine if a timestamp is within a specific range](https://stackoverflow.com/questions/10048249/how-do-i-determine-if-current-time-is-within-a-specified-range-using-pythons-da/10048290) – pault May 17 '18 at 20:56
  • 1
    Whatever your rule is, just write the comparison and you're done. For example, `if call_start <= ftime < call_stop:` will tell you whether `ftime` is in between `call_start` and `call_stop`, just like `2 <= n < 5` will tell you whether `n` is between 2 and 5. And if you want a closed range instead of a half-open range, change the `<` to `<=`. And so on. (Of course all reasonable rules I can think of are only going to pass `t4`, but pault already covered that.) – abarnert May 17 '18 at 21:01
  • t3 is a file that was created at 5:19:14am. The call_start is the first entry I would see in the log. It begins at 5:25:09am. Since t4 didn't get created until 5:35:00am, the call_start has to be in t3. The call_stop is the last log entry i want to find. I would be in t4 because t5 was created at 5:49:19am. – Jeff A May 17 '18 at 21:02
  • OK, so turn that logic into code: whenever `file_times[i] > call_start` but `file_times[i-1] >= call_start`, you want `file_times[i-1]`? – abarnert May 17 '18 at 21:07

1 Answers1

1

One way would be to enumerate() over the items in your list and create ranges from each consecutive pair of times. Then check to see if any of these ranges overlap with (call_start, call_end). If the range overlaps, append the start of the range to your list. You'd also have to include in a special check for the last time in the list.

For example:

for i, ftime in enumerate(file_times):
    if i+1 >= len(file_times):
        # last item in list, add if it's less than call_stop
        scratches_head = ftime < call_stop
    else:
        # check if ranges overlap
        fstart = ftime
        fend = file_times[i+1]
        scratches_head = (fstart <= call_stop) and (fend >= call_start)

    if scratches_head:
        matching_times.append(ftime)

print([datetime.datetime.strftime(x, "%Y-%m-%d %H:%M:%S") for x in matching_times])
#['2018-05-15 05:19:14', '2018-05-15 05:35:00']
pault
  • 41,343
  • 15
  • 107
  • 149
  • I tried implementing this. On the example it works, but when i ran it against a big list of files it matches incorrectly. I can also get its match behavior to change if i do sorted(file_times). – Jeff A May 17 '18 at 23:52
  • @JeffA `file_times` in this example is already sorted- in fact, that is a requirement for this method to work. It assumes that consecutive entries form a range- this only works if the times are sorted. Can you [edit] your question and add an example where this doesn't work? – pault May 18 '18 at 13:54
  • Everything seems to be working fine. The sort was the key to making it all work. When i first tested sorting i most not have done something correctly or misinterpreted the results. Thanks for the help. – Jeff A May 18 '18 at 20:15