Assuming the timestamps are sorted, what about...:
import datetime
def merged_ts(timestamps):
merged_strings = []
counts = []
for ts in timestamps:
dt = datetime.datetime.strptime(ts, '%Y-%m-%d %H:%M:%S.%f')
if not merged_strings: # first-time switch
merged_string.append(ts)
counts.append(1)
oldt = dt
continue
dif = dt - oldt
if dif.total_seconds < 300: # 5 minutes
counts[-1] += 1
continue
merged_string.append(ts)
counts.append(1)
oldt = dt
return merged_strings, counts
Added: OP specified that the timestamps are not originally sorted (but may be sorted for the purpose), and if the timestamps were T, T+4minutes, T+8minutes, T+12minutes, etc, they'd have to merge into a single slot (w/appropriate count). Little change needed for this version, then...:
import datetime
def merged_ts(timestamps):
merged_strings = []
counts = []
for ts in sorted(timestamps):
dt = datetime.datetime.strptime(ts, '%Y-%m-%d %H:%M:%S.%f')
if not merged_strings: # first-time switch
merged_string.append(ts)
counts.append(1)
oldt = dt
continue
dif = dt - oldt
oldt = dt
if dif.total_seconds < 300: # 5 minutes
counts[-1] += 1
else:
merged_string.append(ts)
counts.append(1)
return merged_strings, counts
I just added a sorted
call, and moved the oldt = dt
up to where it happens on every leg of the loop (except on the first-time switch) -- so that each new incoming ts will be checked vs the "closest" (newest) datestamp in the current bucket, rather than vs the "first" (oldest) one as before. (Only as a matter of style, I changed the conditional at the end to if
/else
rather than use a continue
there, as the two legs of the conditional are now well-balanced).
First-time switches are goofy, but removing this one (without repeating the strptime
takes slightly subtler code, such as:
if not timestamps: return [], []
it = iter(sorted(
(ts,datetime.datetime.strptime(ts, '%Y-%m-%d %H:%M:%S.%f'))
for ts in timestamps))
first = next(it)
merged_strings = [first[1]]
oldt = first[0]
counts = [1]
for ts, st in it:
dif = dt - oldt
oldt = dt
if dif.total_seconds < 300: # 5 minutes
counts[-1] += 1
else:
merged_string.append(ts)
counts.append(1)
return merged_strings, counts
The version with the first-time switch seems preferable to me, in this case, purely on stylistic grounds.