1

In a csv file, there is a column with a date (month/day/year) and TWO columns with time (hour:minute). One time column is the start time and the other column is the end time. All columns are objects that are not converted into datetime. In the time columns, there are some time values that are over 23:59 and if they are over, the format is hour:minute:second (what I've seen so far). Ex: 24:50:00, 25:35:00, etc. How would I parse the time columns? I'm getting an error message and I think it's because the time is over the usual limit. Also, for the date column I'm told that if the start time column exceeds 23:59, the date would increase based on how much the time is over the limit. Ex: date of 1/1/2000 with start time column of 24:50:00 (hour:hour:minute) is 1/2/2000 with time 0:50 (hour:minute). Do I create a new column and merge the two, and if so, how? And what should I do for the end time column?

When reading the csv file, I tried to parse the time series with parse_dates:

time_parser = lambda x: pd.datetime.strptime(x, '%H:%M')
df = pd.read_csv('data.csv', parse_dates = ['StartTime'], date_parser = time_parser)

But I get a error message that tells me something like: "25:39 does not match format %H:%M". I'm not sure if the parser just ignores the extra :00 (second) as mentioned above, but I think the problem is that the time exceeds 23:59. How should I go about approaching this?

Eggy
  • 13
  • 3
  • 1
    please provide an example of the file content (a few rows ensuring some with the time >23:59) and the matching expected output – mozway Dec 20 '22 at 18:12
  • are those valid times in some way, or do you want to *discard* those values? – FObersteiner Dec 20 '22 at 18:13
  • Those are valid times, for values over 23:59, it just means the next day. – Eggy Dec 20 '22 at 18:21
  • You cannot parse to `time` then, you'll have to use [timedelta](https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html) and add that to the date as a duration. [Here's an example](https://stackoverflow.com/a/51651619/10197418) how this looks like in pandas. – FObersteiner Dec 20 '22 at 18:25
  • Does timedelta automatically account for addition in this way? Like 1/1/2000 20:00 (month/day/year hour:minute) + 5:00 (hour:minute) = 1/2/2000 1:00? – Eggy Dec 20 '22 at 18:31
  • yes, that is exactly why you'd use this approach. – FObersteiner Dec 20 '22 at 18:33
  • The pd.to_timedelta() function isn't reading the time format. It says, "ValueError: expected hh:mm:ss format". I think it might not be able to read anything above 23:59 either. – Eggy Dec 20 '22 at 18:58
  • When testing if timedelta values auto increments day for values over 23:59, it works. But when trying to convert the time columns, it doesn't work. Does it have anything to do with the extra :00 for those values? I still can't convert the time columns into timedelta. – Eggy Dec 20 '22 at 19:56

1 Answers1

1

Parse the date to datetime, parse the time to timedelta and add the two together. Note that to_timedelta expects a certain input format (HH:MM:SS), which in your case could be enforced by prepending :00. Ex:

import pandas as pd

df = pd.DataFrame({"date": ["1/1/2000", "1/1/2000", "1/1/2000"], 
                   "time": ["23:59", "24:50", "25:30"]})


df["datetime"] = (
    pd.to_datetime(df["date"], format="%m/%d/%Y") + 
    pd.to_timedelta(df["time"] + ":00")
    )

df
       date   time            datetime
0  1/1/2000  23:59 2000-01-01 23:59:00
1  1/1/2000  24:50 2000-01-02 00:50:00
2  1/1/2000  25:30 2000-01-02 01:30:00
FObersteiner
  • 22,500
  • 8
  • 42
  • 72