Python - Pandas - Convert YYYYMM to datetime

Question

Beginner python (and therefore pandas) user. I am trying to import some data into a pandas dataframe. One of the columns is the date, but in the format "YYYYMM". I have attempted to do what most forum responses suggest:

df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m')

This doesn't work though (ValueError: unconverted data remains: 3). The column actually includes an additional value for each year, with MM=13. The source used this row as an average of the past year. I am guessing to_datetime is having an issue with that.

Could anyone offer a quick solution, either to strip out all of the annual averages (those with the last two digits "13"), or to have to_datetime ignore them?

Have you tried to parse it while importing your data? For instance if you read a csv with `read_csv` in pandas, you have an argument which is `parse_dates = [your_col_name]` — ysearka, Jul 20 '17 at 13:12

EdChum · Accepted Answer · 2017-07-20T13:08:13.380

12

pass errors='coerce' and then dropna the NaT rows:

df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m', errors='coerce').dropna()

The duff month values will get converted to NaT values

In[36]:
pd.to_datetime('201613', format='%Y%m', errors='coerce')

Out[36]: NaT

Alternatively you could filter them out before the conversion

df_cons['YYYYMM'] = pd.to_datetime(df_cons.loc[df_cons['YYYYMM'].str[-2:] != '13','YYYYMM'], format='%Y%m', errors='coerce')

although this could lead to alignment issues as the returned Series needs to be the same length so just passing errors='coerce' is a simpler solution

edited Jul 20 '17 at 13:08

answered Jul 20 '17 at 13:02

EdChum

376,765
198
813
562

Awesome thanks. It looks like the `.dropna()` didn't actually drop the `NaT` rows for me, but I think I can figure that part out. – Mtd240 Jul 20 '17 at 13:09
You may need to do that as a separate step, so after assignment then call `dropna()` – EdChum Jul 20 '17 at 13:10

frogcoder · Answer 2 · 2017-07-20T14:23:25.700

0

Clean up the dataframe first.

df_cons = df_cons[~df_cons['YYYYMM'].str.endswith('13')]
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'])

May I suggest turning the column into a period index if YYYYMM column is unique in your dataset.

First turn YYYYMM into index, then convert it to monthly period.

df_cons = df_cons.reset_index().set_index('YYYYMM').to_period('M')

edited Jul 20 '17 at 14:23

answered Jul 20 '17 at 14:07

frogcoder

963
1
7
17

Python - Pandas - Convert YYYYMM to datetime

2 Answers2

Linked