2

I'm currently trying to group messages that are sent out by 1 second time intervals. I'm currently calculating time latency with this:

def time_deltas(infile): 
entries = (line.split() for line in open(INFILE, "r")) 
ts = {}
for e in entries: 
    if " ".join(e[2:5]) == "T out: [O]": 
        ts[e[8]] = e[0]    
    elif " ".join(e[2:5]) == "T in: [A]":    
        in_ts, ref_id = e[0], e[7] 
        out_ts = ts.pop(ref_id, None) 
        yield (float(out_ts),ref_id[1:-1],(float(in_ts)*1000 - float(out_ts)*1000))

INFILE = 'C:/Users/klee/Documents/test.txt'
import csv 

with open('test.csv', 'w') as f: 
csv.writer(f).writerows(time_deltas(INFILE)) 

HOWEVER I want to calculate the number of "T in: [A]" messages per second that are sent out, and have been trying to work with this to do so:

import datetime
import bisect
import collections

data=[ (datetime.datetime(2010, 2, 26, 12, 8, 17), 5594813L), 
  (datetime.datetime(2010, 2, 26, 12, 7, 31), 5594810L), 
  (datetime.datetime(2010, 2, 26, 12, 6, 4) , 5594807L),
]
interval=datetime.timedelta(seconds=50)
start=datetime.datetime(2010, 2, 26, 12, 6, 4)
grid=[start+n*interval for n in range(10)]
bins=collections.defaultdict(list)
for date,num in data:
idx=bisect.bisect(grid,date)
   bins[idx].append(num)
for idx,nums in bins.iteritems():
print('{0} --- {1}'.format(grid[idx],len(nums)))

which can be found here: Python: group results by time intervals

(I realize the units would be off for what I want, but I'm just looking into the general idea...)

I've been mostly unsuccessful thus far and would appreciate any help.

Also, The data appears as:

082438.577652 - T in: [A] accepted. ordID [F25Q6] timestamp [082438.575880] RefNumber [6018786] State [L]
ekad
  • 14,436
  • 26
  • 44
  • 46
eunhealee
  • 189
  • 2
  • 4
  • 13

2 Answers2

3

Assuming you want to group your data by those issued within 1 second intervals on the second, we can make use of the fact that your data is ordered and that int(out_ts) truncates the timestamp to the second which we can use as a grouping key.

Simplest way to do the grouping would be to use itertools.groupby:

from itertools import groupby

data = get_time_deltas(INFILE)  
get_key = lambda x: int(x[0])  # function to get group key from data
bins = [(k, list(g)) for k, g in groupby(data, get_key)]

bins will be a list of tuples where the first value in the tuple is the key (integer, e.g. 082438) and the second value is the a list of data entries that were issued on that second (with timestamp = 082438.*).

Example usage:

# print out the number of messages for each second
for sec, data in bins:
    print('{0} --- {1}'.format(sec, len(data)))

# write (sec, msg_per_sec) out to CSV file
import csv
with open("test.csv", "w") as f:
    csv.writer(f).writerows((s, len(d)) for s, d in bins)

# get average message per second
message_counts = [len(d) for s, d in bins]
avg_msg_per_second = float(sum(message_count)) / len(message_count)

P.S. In this example, a list was used for bins so that the order of data is maintained. If you need random access to the data, consider using an OrderedDict instead.


Note that it is relatively straight-forward to adapt the solution to group by multiples of seconds. For example, to group by messages per minute (60 seconds), change the get_key function to:

get_key = lambda x: int(x[0] / 60)  # truncate timestamp to the minute
Shawn Chin
  • 84,080
  • 19
  • 162
  • 191
1

This is easier if you don't base your grid on time intervals with bisection.

Instead, do this. Transform each interval to a single number.

def map_time_to_interval_number( epoch, times )
    for t in times:
        delta= (t - epoch)
        delta_t= delta.days*60*60*24 + delta.seconds + delta.microseconds/1000000.0
        interval = delta_t / 50
        yield interval, t

counts = defaultdict( int )
epoch = min( data ) 
for interval, time in map_time_to_interval_number( epoch, data ):
    counts[interval] += 1

The interval will be an integer. 0 is the first 50-second interval. 1 is the second 50-second interval. etc.

You can reconstruct the timestamp from the interval knowing that each interval is 50-seconds wide and begins at epoch.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • I'm having difficulty with this: NameError: name 'defaultdict' is not defined. I apologize if I'm just not familiar with this. – eunhealee Jan 11 '12 at 21:39
  • You're having difficulty because Google is broken. Here's the first hit on a Google search for "python defaultdict". http://docs.python.org/library/collections.html It's important that you read and understand this library. – S.Lott Jan 11 '12 at 21:40