0

I have flight data stored in a csv including for example the scheduled departure in the form 0005 (for 00:05 am). Thus in order to work with the data, I need to parse it into datetimeformat - here: "%H%M". Can you explain why it isn´t working? Thanlls for your help!!!

df['SCHEDULED_DEPARTURE'] = pd.to_datetime(df['SCHEDULED_DEPARTURE'], format="%H%M")

ValueError: time data '5' does not match format '%H%M' (match)

  • You might check this out: https://stackoverflow.com/questions/25015711/time-data-does-not-match-format – Danielle Hoopes Apr 12 '22 at 20:52
  • The error message indicates that the value is `5`, not `0005`. What's the dtype of the column? – Barmar Apr 12 '22 at 20:54
  • I guess there is a value in this DateTime column that has a value of "5" which doesn't fit the format you specified. one solution you can do is set the error parameter as coerce. See doc https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html – Vae Jiang Apr 12 '22 at 20:54
  • How are you reading the CSV file into pandas df? Most likely your SCHEDULED_DEPARTURE column is converted to integers by default and thus 0005 turns into 5 – Karol Żak Apr 12 '22 at 20:56
  • The dtype is int64 – Martin Müller Apr 12 '22 at 20:56
  • ```df = pd.read_csv('./Originaldata/flights.csv', sep=',', usecols=['YEAR', 'MONTH', 'DAY', 'SCHEDULED_DEPARTURE'])``` @Karol Zak – Martin Müller Apr 12 '22 at 20:58
  • @MartinMüller yep, you need to specify dtype for 'SCHEDULED_DEPARTURE' column otherwise it's converted to integer. I added my answer with code that should fix it – Karol Żak Apr 12 '22 at 21:08

1 Answers1

1

The problem is with how you are reading the CSV with pandas into dataframe. I guess your SCHEDULED_DEPARTURE column gets auto converted to integer type and thus 0005 becomes just 5

# reading CSV "as is" with autoconvertion of types
pd.read_csv('test.csv', header=None)

enter image description here

# reading CSV with forcing data type for specific columns
pd.read_csv('test.csv', header=None, dtype={2:str})

enter image description here

So in your case your read_csv function should look somewhat like this:

df = pd.read_csv(
  './Originaldata/flights.csv',
  sep=',',
  usecols=['YEAR', 'MONTH', 'DAY', 'SCHEDULED_DEPARTURE'],
  dtype={'SCHEDULED_DEPARTURE':str}
)
Karol Żak
  • 2,158
  • 20
  • 24