0

I'm trying to change the data-type of a column in a pandas dataframe to datetype from string. My required format is Mon-Year (eg: Jan-2018). I have tried:

dataframe['date_col'] = pd.to_datetime(dataframe['date_col'], format='%b-%Y')

I got the following warning, and the output that I get is in the format year-month-day something like (2018-01-01).

main:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

How to get the output format I want, which is Jan-2018.

Jan Trienes
  • 2,501
  • 1
  • 16
  • 28
Shhiv
  • 11
  • 4
  • Can you provide the first couple of rows of your data to reproduce this problem? For instance `df.head(5)`. – Jan Trienes Oct 14 '18 at 10:11
  • usd_applicant_dataset['last_pymnt_d'].head(5) Out[25]: 0 Jan-2015 1 Apr-2013 2 Jun-2014 3 Jan-2015 4 Jan-2016 Name: last_pymnt_d, dtype: object, After using usd_applicant_dataset['last_pymnt_d'] = pd.to_datetime(usd_applicant_dataset['last_pymnt_d'], format = '%b-%Y')Out[24]: 0 2015-01-01 1 2013-04-01 2 2014-06-01 3 2015-01-01 4 2016-01-01 Name: last_pymnt_d, dtype: datetime64[ns] – Shhiv Oct 14 '18 at 11:10
  • Can you add this data to your question (as [edit](https://stackoverflow.com/posts/52800107/edit))? Before applying the `pd.to_datetime` and after. – Jan Trienes Oct 14 '18 at 11:15

1 Answers1

0

datetime values are not stored as strings!

They are stored internally as integers and must be complete: for example, a datetime value must include a day, month and year. You must therefore choose one of the following:

  1. Store your series with datetime dtype and see the default representation, i.e. 2018-01-01 when you display or print the series.
  2. Store your series with object dtype and choose the representation of your choice.

There are no "intermediary" options to get the best of both worlds.

Option 1: datetime

Just use pd.to_datetime as you are using it now. Here's a demo:

df = pd.DataFrame({'date_col': ['Jan-2018', 'Oct-2018', 'Dec-2018']})

df['date_col'] = pd.to_datetime(df['date_col'], format='%b-%Y')

print(df, df['date_col'].dtype)

    date_col
0 2018-01-01
1 2018-10-01
2 2018-12-01 datetime64[ns]

Option 2: object

Well, this is exactly what you have as your input. Do nothing. Your input format '%b-%Y' is already in the form 'Jan-2018'.

Note on SettingWithCopyWarning

This warning has nothing to do with pd.to_datetime. In all likelihood your dataframe is an amiguous slice of another dataframe and therefore Pandas is showing a warning that you may see unexpected results if you do not explicitly copy a dataframe. See also How to deal with SettingWithCopyWarning in Pandas?

jpp
  • 159,742
  • 34
  • 281
  • 339