1

I am trying to convert timing data (lap times and sector times in motor racing) given in the format [ss.000 or m:ss.000] (se below) to seconds or datetime, in order to be able to analyse it.

df = pd.DataFrame([[48.004, 1:13.564], [38.965, 58.223], [45.630, 1:10.084]], columns=['S1', 'S2'])

What I would like to get is:

   A          B
0  48.004  73.564
1  38.965  58.223
2  45.630  70.084

The only way I was able to convert, is to split data first by '.' with str.split and then by ':'. Afterwards, I am converting minutes to seconds, add seconds and append milisesconds.

Is there any other more elegant way to convert to seconds?

  • 1
    Can you also mention the sample output of the above input you have given... – naveen p Jul 09 '22 at 17:29
  • What does this have to do with Pandas? Is this data stored in a DataFrame or Series? From a glance, it looks like the parsing will be done the same regardless, but to provide a full answer, we might need to `explode` first, or something like that. Please provide a [mre]. For specifics, see [How to make good reproducible pandas examples](/q/20109391/4518341) – wjandrea Jul 09 '22 at 17:39
  • I import the data from a csv into a DataFrame, as there are more columns - 3 sector times, total lap time, driver, etc. – Andrei Filep Jul 09 '22 at 18:01
  • @Andrei `1:13` is a syntax error. They're supposed to be strings, aren't they? – wjandrea Jul 09 '22 at 19:24

2 Answers2

1

I would suggest converting the strings in proper time format(hh:mm:ss.fff), then use pd.to_timedelta to parse the strings to timedelta:

d = {'^(\d+\.\d+)$': r'00:00:\1', '^(\d+:\d+\.\d+)$': r'00:\1'}
df.replace(d, regex=True).apply(pd.to_timedelta)

                      S1                     S2
0 0 days 00:00:48.004000 0 days 00:01:13.564000
1 0 days 00:00:38.965000 0 days 00:00:58.223000
2 0 days 00:00:45.630000 0 days 00:01:10.084000
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • Thank you for your help! I was able to get exactly what I wanted with a small adjustment: `df['S1'] = df['Sector1Time'].replace(d, regex=True).apply(pd.to_timedelta).dt.total_seconds()` Could you please give me some information on what your first line of code is doing? `d = {'^(\d+\.\d+)$': r'00:00:\1', '^(\d+:\d+\.\d+)$': r'00:\1'}` – Andrei Filep Jul 27 '22 at 18:14
0

Use the code below for your testdata:

    from datetime import datetime, timedelta
    import pandas as pd
    
    comma_fun = lambda x: x.replace(':', ',').replace('.', ',').split(',')
    min_sec_mill_fun = lambda elem: comma_fun(elem) if (elem.find(':') >= 0) else comma_fun('0:' + elem)
    
    
    def tot_sec_fun(input_list):
        prs_time = timedelta(minutes=int(input_list[0]), seconds=int(input_list[1]), milliseconds=int(input_list[2]))
        prs_time.total_seconds()
        out_val = int(prs_time.total_seconds() * 1000)
        return out_val/1000

For the below scenario and hoping column types are strings

    df = pd.DataFrame([['48.004', '1:13.564'], ['38.965', '58.223'], ['45.630', '1:10.084']], columns=['S1', 'S2'])
    df['S1'] = df['S1'].map(lambda a: tot_sec_fun(min_sec_mill_fun(a)))
    df['S2'] = df['S2'].map(lambda a: tot_sec_fun(min_sec_mill_fun(a)))

input:

       S1        S2
0  48.004  1:13.564
1  38.965    58.223
2  45.630  1:10.084

output

       S1      S2
0  48.004  73.564
1  38.965  58.223
2  45.630  70.084
naveen p
  • 74
  • 9