1

I have a pandas code and work with lot of datafiles. I use the following code to convert time delta to date time index.

df['date_time'] = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:45:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"]
df['date_time'] = pd.DatetimeIndex(df['date_time'])

But one particular data file gives me the error:

raise e
ValueError: Unknown string format

What could be the reason behind this error? If it is due to a invalid data in the datafile, how to remove it?

  • Can you add some sample data? – jezrael Jul 10 '17 at 13:51
  • The code actually works fine with most of the input data. But few input files shows this error. So i would like to know if it is due to the invalid data. If so, how to remove them –  Jul 10 '17 at 13:54
  • Not uncommon in my experience, especially if the data was loaded with an unknown encoding scheme. What happens if you first run it through a pd.to_datetime(df['date_time'])? In my experience, if you can isolate the offending string, you'll have your answer. – Adestin Jul 10 '17 at 13:58

1 Answers1

1

I think you need parameter errors='coerce' for convert non datetime to NaT in to_datetime:

df['date_time'] = pd.to_datetime(df['date_time'], errors='coerce')

And then if need remove all rows with NaT use dropna:

df = df.dropna(subset=['date_time'])

Sample:

a = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:45:00",
     "2016-05-24 12:50:00","2016-05-25 23:00:00","aaa"]
df = pd.DataFrame({'date_time':a})
print (df)
             date_time
0  2016-05-19 08:25:00
1  2016-05-19 16:00:00
2  2016-05-20 07:45:00
3  2016-05-24 12:50:00
4  2016-05-25 23:00:00
5                  aaa

df['date_time'] = pd.to_datetime(df['date_time'], errors='coerce')
print (df)
            date_time
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00
2 2016-05-20 07:45:00
3 2016-05-24 12:50:00
4 2016-05-25 23:00:00
5                 NaT

df = df.dropna(subset=['date_time'])
print (df)
            date_time
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00
2 2016-05-20 07:45:00
3 2016-05-24 12:50:00
4 2016-05-25 23:00:00
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for your help ! but errors = "coerce" didnt improve the code :( and when i tried dropna, i got the following error : ValueError: No axis named date_time for object type –  Jul 10 '17 at 14:06
  • why it did not improve? I add parameter `subset` to dropna, I hope now it works nice. – jezrael Jul 10 '17 at 14:08
  • subset worked :) but still same error! raise e ValueError: Unknown string format. looks like i have to manually check the datafile ! –  Jul 10 '17 at 14:25
  • Hmmm, if use `errors='coerce'` get `Unknown string format` ? – jezrael Jul 10 '17 at 14:27
  • yes i do get the error ! in spite of errors='coerce' and dropna ! :/ –  Jul 10 '17 at 14:29
  • It is really interesting... Maybe some problem with data... I never seen it before :( – jezrael Jul 10 '17 at 14:35
  • Thank you. What was problem? I am really curious. – jezrael Jul 11 '17 at 11:28
  • i could not find it ! i ignored these files as of now ! bcoz i am working on something else right now ! https://stackoverflow.com/questions/44946141/how-to-input-large-data-into-python-pandas-using-looping-or-parallel-computing –  Jul 11 '17 at 11:30
  • Unfortunately multiprocessing is for me big unknown area :( – jezrael Jul 11 '17 at 11:31