1

A similar question has been asked before but has received no responses

I have looked through a number of forums for a solution. The other questions involve a year but mine does not - it is simply H:M:S

I web scraped this data which returned

Time - 36:42 38:34 1:38:32 1:41:18

Data samples here: Source data 1 and Source data 2

I need this time in minutes like so 36.70 38.57 98.53 101.30

To do this I tried this:

time_mins = []
for i in time_list:
    h, m, s = i.split(':')
    math = (int(h) * 3600 + int(m) * 60 + int(s))/60
    time_mins.append(math)

But that didn't work because 36:42 is not in the format H:M:S, so I tried to convert 36:42 using this

df1.loc[1:,6] = df1[6]+ timedelta(hours=0)

and this

df1['minutes'] = pd.to_datetime(df1[6], format='%H:%M:%S')

but have had no luck.

Can I do it at the extraction stage? I have to do it for over 500 rows

row_td = soup.find_all('td') 

If not, how can it do it after conversion into a data frame

Thanks in advance

OldRider
  • 15
  • 3
  • Does this answer your question? [How to convert a time string to seconds?](https://stackoverflow.com/questions/10663720/how-to-convert-a-time-string-to-seconds) – FObersteiner May 25 '20 at 06:50

4 Answers4

0

If your input (time delta string) only contains hours/minutes/seconds (no days etc.), you could use a custom function that you apply to the column:

import pandas as pd

df = pd.DataFrame({'Time': ['36:42', '38:34', '1:38:32', '1:41:18']})

def to_minutes(s):
    # split string s on ':', reverse so that seconds come first
    # multiply the result as type int with elements from tuple (1/60, 1, 60) to get minutes for each value
    # return the sum of these multiplications
    return sum(int(a)*b for a, b in zip(s.split(':')[::-1], (1/60, 1, 60)))

df['Minutes'] = df['Time'].apply(to_minutes)
# df['Minutes']
# 0     36.700000
# 1     38.566667
# 2     98.533333
# 3    101.300000
# Name: Minutes, dtype: float64

Edit: it took me a while to find it but this is a variation of this question. And my answer here is based on this reply.

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • Thank you @MrFuppes. That worked beautifully. Took me some time to understand the `[::-1]` was to reverse the split. As is obvious I am an absolute noob and I learnt many things today from @NileshIngle,@KaustubhBadrike and you.SO thank you all – OldRider May 25 '20 at 12:40
  • @OldRider: happy to help! your problem is not the easiest... the `[::-1]` is a convenient alternative to `reversed()` or `.reverse` (see [here](https://stackoverflow.com/questions/3940128/how-can-i-reverse-a-list-in-python)). Make sure to accept an answer as solution if it helped you solve the problem. Not marking an answer as solution is also a valid option though ;-) – FObersteiner May 25 '20 at 12:47
0

You were on the right track. Below has some modifications to your code and it gets the minutes.

Create a function

def get_time(i):
    ilist = i.split(':')
    if(len(ilist)==3):
        h, m, s = i.split(':')
    else:
        m, s = i.split(':')
        h = 0
math = (int(h) * 3600 + int(m) * 60 + int(s))/60
return np.round(math, 2)

Call the function using split

x = "36:42 38:34 1:38:32 1:41:18"
x = x.split(" ")
xmin = [get_time(i) for i in x]
xmin

Output

[36.7, 38.57, 98.53, 101.3]
Kaustubh Badrike
  • 580
  • 2
  • 15
Nilesh Ingle
  • 1,777
  • 11
  • 17
  • Hi Nilesh - Unfortunately, I am getting an error message while using your code snippet `NameError Traceback (most recent call last) in () 6 m, s = i.split(':') 7 h = 0 ----> 8 math = (int(h) * 3600 + int(m) * 60 + int(s))/60 9 return np.round(math, 2) NameError: name 'h' is not defined` @Kaustubh Badrike and @MrFuppes's code snippets both worked – OldRider May 25 '20 at 12:28
0

I have no experience with pandas, but here is something you may find useful

...
time_mins = []
for i in time_list:
    parts = i.split(':')
    minutes_multiplier = 1/60
    math = 0
    for part in reversed(parts):
        math += (minutes_multiplier * int(part))
        minutes_multiplier *= 60
    time_mins.append(math)
...
Kaustubh Badrike
  • 580
  • 2
  • 15
0

I had earlier commented that @NileshIngle's response above was not working as it was giving me a

NameError: name 'h' is not defined.

A simple correction was required - moving h above m,s as it is the first variable referenced

h = 0 # move this above
m, s = i.split(':') 


 def get_time(i):
    ilist = i.split(':')
    if(len(ilist)==3):
        h, m, s = i.split(':')
    else:
        h = 0
        m, s = i.split(':')
    math = (int(h) * 3600 + int(m) * 60 + int(s))/60
    return np.round(math, 2)

I would like to thank @MrFuppes, @NileshIngle and @KaustubhBadrike for taking the time to respond. I have learned three different methods.

OldRider
  • 15
  • 3