How to calculate average of seconds & milliseconds in Python?

Question

My data is in the format of HH:MM:SS.milliseconds. How can I calculate average of such data in Python? I need average in form of miliseconds. I looked at several other similar postings but they do not answer my question.

My data =  0:00:00.618000
           0:00:00.593000
           0:00:00.569000
           0:00:00.572000
           0:00:00.636000
           0:00:01
           0:00:01
           0:00:00.546000
           0:00:00.400000

First, your format has microseconds, not milliseconds (even though the last three digits are always 0, so they might as well be millis). — abarnert, Jun 01 '18 at 21:32
Anyway: Convert them into `datetime.timedelta` objects, or integer microseconds or milliseconds, or float seconds, or any other type that can do basic arithmetic, and then you can average them the same way you'd average anything else: either with `statistics.mean`, or just `sum(times) / len(times)`. — abarnert, Jun 01 '18 at 21:33
What have you tried? Hint: convert everything to pure millisecond representation — Ofer Sadan, Jun 01 '18 at 21:33
Also, why have you tagged this with both -3.x and -2.7? Do you actually need code that works in both? — abarnert, Jun 01 '18 at 21:34
@abarnert: I will try what you have suggested. I am a beginner in python. No, I don't want code in 3.x just in 2.7. I mistakely tagged 3.x. Thank you for your response. — A.S, Jun 01 '18 at 21:36
@OferSadan: I tried searching for conversion, I am beginner in Python. Thank you for your response. — A.S, Jun 01 '18 at 21:37

score 4 · Answer 1 · answered Jun 01 '18 at 21:38

4

I don't know you data size or if you require a built-in solution or not.

However, one simple solution involves using pandas. If you have

mydata = ["0:00:00.618000",
          "0:00:00.593000",
          "0:00:00.569000",
          "0:00:00.572000",
          "0:00:00.636000",
          "0:00:01",
          "0:00:01",
          "0:00:00.546000",
          "0:00:00.400000"]

You can use pd.to_timedelta and mean and just do

pd.Series(pd.to_timedelta(mydata)).mean()

Might be/might not be overkill, but its really readable and straightforward.

answered Jun 01 '18 at 21:38

rafaelc

57,686
15
58
82

RafaelC: Thank you. Can we do something similar in list? – A.S Jun 01 '18 at 21:41
If you want to implement it yourself, I'd recommend @abarnert 's answer :) – rafaelc Jun 01 '18 at 21:44
1

@A.S This code already takes a list, and if you need the result to be a list instead of a `Series`, you can just call `list` on it. – abarnert Jun 01 '18 at 21:45

score 2 · Answer 2 · answered Jun 01 '18 at 21:41

The first step is to parse all those timestamps into something that you can perform arithmetic on. This can be timedelta objects, or integer microseconds (or milliseconds, since your times all have 0 micros), or float seconds, or any other reasonable type.

For example, assuming that input is one big string:

ts = []
for h, m, s, u in re.findall(r'(\d+):(\d+):(\d+)(?:\.(\d+))?', bigstring):
    h, m, s = int(h), int(m), int(s)
    u = int(u) if u else 0
    ts.append(datetime.timedelta(hours=h, minutes=m, seconds=s, microseconds=u))

If it's a list of strings, or a file object, etc. just change it to iterate that and do re.search on each one, instead of iterating re.findall.

Then we can average them the same as any other values:

sum(ts, datetime.timedelta()) / len(ts)

Since I used timedelta values, that's what the result will be:

datetime.timedelta(0, 0, 659333)

… or, if you print it:

0:00:00.659333

… or, if you want it as, say, a number of seconds, just call its total_seconds() method:

0.659333

lakshayg · Answer 3 · 2018-06-01T21:46:02.983

I'm assuming that each of these is a string, you can do the following using no libraries in both python 2 and 3

def mean(numbers):
    return float(sum(numbers)) / max(len(numbers), 1)

def timestamp_to_millis(timestamp):
    hour, min, sec = map(float, timestamp.split(':'))
    mills = (((hour * 60 + min) * 60) + sec) * 1000
    return millis


my_data = # [ timestamps ... ]
my_mean = mean(map(timestamp_to_millis, my_data))

score 1 · Answer 4 · answered Jun 01 '18 at 21:45

It looks like the input consists of strings. Those should be converted to datetime.datetime objects. Use datetime.datetime.strptime to do that.

After that, the average of anything is calculated as sum(values) / len(values), but unfortunately you cannot sum dates. What you can is sum date differences, so you'll have to make some conversions

For example:

dates = [datetime.datetime(1951, 1, 5),
         datetime.datetime(1951, 1, 7),
         datetime.datetime(1951, 1, 7)]

base_datetime = datetime.datetime.now()  # really, anything

relative_dates = [d-base_datetime for d in dates]

average_relative_datetime = sum(relative_dates, datetime.timedelta()) / len(relative_dates)

result = base_datetime + average_relative_datetime  # datetime.datetime(1951, 1, 6, 8, 0)

score 1 · Answer 5 · answered Jun 01 '18 at 21:57

Here's one approach using datetime.timedelta. The tricky part is converting strings into timedelta objects. Sequence unpacking makes this easier and efficient to implement.

from datetime import timedelta

data = ['0:00:00.618000', '0:00:00.593000', '0:00:00.569000',
        '0:00:00.572000', '0:00:00.636000', '0:00:01',
        '0:00:01', '0:00:00.546000', '0:00:00.400000']

def converter(x):
    if '.' not in x:
        x += '.000000'
    hrs, mins, secs, millis = map(int, x[:-3].replace('.', ':').split(':'))
    return timedelta(hours=hrs, minutes=mins, seconds=secs, milliseconds=millis)

res = sum(map(converter, data), timedelta(0)) / len(data)

print(res)

0:00:00.659333

Note that sum only appears to work with timedelta objects with the added timedelta(0) argument, this trick courtesy of @JochenRitzel.

How to calculate average of seconds & milliseconds in Python?

5 Answers5