-2

My data is in the format of HH:MM:SS.milliseconds. How can I calculate average of such data in Python? I need average in form of miliseconds. I looked at several other similar postings but they do not answer my question.

My data =  0:00:00.618000
           0:00:00.593000
           0:00:00.569000
           0:00:00.572000
           0:00:00.636000
           0:00:01
           0:00:01
           0:00:00.546000
           0:00:00.400000
jpp
  • 159,742
  • 34
  • 281
  • 339
A.S
  • 305
  • 1
  • 4
  • 20
  • 2
    First, your format has microseconds, not milliseconds (even though the last three digits are always 0, so they might as well be millis). – abarnert Jun 01 '18 at 21:32
  • 1
    Anyway: Convert them into `datetime.timedelta` objects, or integer microseconds or milliseconds, or float seconds, or any other type that can do basic arithmetic, and then you can average them the same way you'd average anything else: either with `statistics.mean`, or just `sum(times) / len(times)`. – abarnert Jun 01 '18 at 21:33
  • 2
    What have you tried? Hint: convert everything to pure millisecond representation – Ofer Sadan Jun 01 '18 at 21:33
  • 1
    Also, why have you tagged this with both -3.x and -2.7? Do you actually need code that works in both? – abarnert Jun 01 '18 at 21:34
  • @abarnert: I will try what you have suggested. I am a beginner in python. No, I don't want code in 3.x just in 2.7. I mistakely tagged 3.x. Thank you for your response. – A.S Jun 01 '18 at 21:36
  • @OferSadan: I tried searching for conversion, I am beginner in Python. Thank you for your response. – A.S Jun 01 '18 at 21:37

5 Answers5

4

I don't know you data size or if you require a built-in solution or not.

However, one simple solution involves using pandas. If you have

mydata = ["0:00:00.618000",
          "0:00:00.593000",
          "0:00:00.569000",
          "0:00:00.572000",
          "0:00:00.636000",
          "0:00:01",
          "0:00:01",
          "0:00:00.546000",
          "0:00:00.400000"]

You can use pd.to_timedelta and mean and just do

pd.Series(pd.to_timedelta(mydata)).mean()

Might be/might not be overkill, but its really readable and straightforward.

rafaelc
  • 57,686
  • 15
  • 58
  • 82
2

The first step is to parse all those timestamps into something that you can perform arithmetic on. This can be timedelta objects, or integer microseconds (or milliseconds, since your times all have 0 micros), or float seconds, or any other reasonable type.

For example, assuming that input is one big string:

ts = []
for h, m, s, u in re.findall(r'(\d+):(\d+):(\d+)(?:\.(\d+))?', bigstring):
    h, m, s = int(h), int(m), int(s)
    u = int(u) if u else 0
    ts.append(datetime.timedelta(hours=h, minutes=m, seconds=s, microseconds=u))

If it's a list of strings, or a file object, etc. just change it to iterate that and do re.search on each one, instead of iterating re.findall.

Then we can average them the same as any other values:

sum(ts, datetime.timedelta()) / len(ts)

Since I used timedelta values, that's what the result will be:

datetime.timedelta(0, 0, 659333)

… or, if you print it:

0:00:00.659333

… or, if you want it as, say, a number of seconds, just call its total_seconds() method:

0.659333
abarnert
  • 354,177
  • 51
  • 601
  • 671
1

I'm assuming that each of these is a string, you can do the following using no libraries in both python 2 and 3

def mean(numbers):
    return float(sum(numbers)) / max(len(numbers), 1)

def timestamp_to_millis(timestamp):
    hour, min, sec = map(float, timestamp.split(':'))
    mills = (((hour * 60 + min) * 60) + sec) * 1000
    return millis


my_data = # [ timestamps ... ]
my_mean = mean(map(timestamp_to_millis, my_data))
lakshayg
  • 2,053
  • 2
  • 20
  • 34
1

It looks like the input consists of strings. Those should be converted to datetime.datetime objects. Use datetime.datetime.strptime to do that.

After that, the average of anything is calculated as sum(values) / len(values), but unfortunately you cannot sum dates. What you can is sum date differences, so you'll have to make some conversions

For example:

dates = [datetime.datetime(1951, 1, 5),
         datetime.datetime(1951, 1, 7),
         datetime.datetime(1951, 1, 7)]

base_datetime = datetime.datetime.now()  # really, anything

relative_dates = [d-base_datetime for d in dates]

average_relative_datetime = sum(relative_dates, datetime.timedelta()) / len(relative_dates)

result = base_datetime + average_relative_datetime  # datetime.datetime(1951, 1, 6, 8, 0)
zvone
  • 18,045
  • 3
  • 49
  • 77
1

Here's one approach using datetime.timedelta. The tricky part is converting strings into timedelta objects. Sequence unpacking makes this easier and efficient to implement.

from datetime import timedelta

data = ['0:00:00.618000', '0:00:00.593000', '0:00:00.569000',
        '0:00:00.572000', '0:00:00.636000', '0:00:01',
        '0:00:01', '0:00:00.546000', '0:00:00.400000']

def converter(x):
    if '.' not in x:
        x += '.000000'
    hrs, mins, secs, millis = map(int, x[:-3].replace('.', ':').split(':'))
    return timedelta(hours=hrs, minutes=mins, seconds=secs, milliseconds=millis)

res = sum(map(converter, data), timedelta(0)) / len(data)

print(res)

0:00:00.659333

Note that sum only appears to work with timedelta objects with the added timedelta(0) argument, this trick courtesy of @JochenRitzel.

jpp
  • 159,742
  • 34
  • 281
  • 339