1

I have a date column of the format YYYY-MM-DD. I want to slice the only year and month from it. But I don't want the "-" as I have to later convert it into an integer to feed into my linear regression model. It's current datatype is "object".

Dataframe :-

         date   open  close   high    low
0  2019-10-08  56.46  56.10  57.02  56.08
1  2019-10-09  56.76  56.76  56.95  56.41
2  2019-10-10  56.98  57.52  57.61  56.83
3  2019-10-11  58.24  59.05  59.41  58.08
4  2019-10-14  58.73  58.97  59.53  58.67
  • 1
    What's the expected output? You want to convert `2020-10-02` to `202010`? And don't post images, transcribing images is tedious, post `df.to_dict()` or `print(df)` to the question. – Ch3steR Oct 10 '20 at 05:27
  • Yes I want to convert it to "202010" or just "2010". – Chandraraj Singh Oct 10 '20 at 05:30
  • Please share a sample of input data frame with expected output. This makes it easy for us to understand the problem. – Mayank Porwal Oct 10 '20 at 05:32
  • @Ch3steR Is this how I should post dataframes? Sorry, I'm a little new to this. – Chandraraj Singh Oct 10 '20 at 05:43
  • 1
    This is a more robust way of posting dataframe data: [How to provide a reproducible copy of your DataFrame using `df.head(30).to_clipboard(sep=',')`](https://stackoverflow.com/questions/52413246) and contains links to making synthetic data. The issue with `print(df.head())` is if there are spaces in the data or column headers, makes reproducing the dataframe tedious and manual. – Trenton McKinney Oct 10 '20 at 05:44

2 Answers2

3

You can use pd.to_datetime to convert date column to datetime then use pd.Series.dt.strftime.

s = pd.to_datetime(df['date'])
df['date'] = s.dt.strftime("%Y%m") # would give 202010
# or
# df['date'] = s.dt.strftime("%y%m") # would give 2010
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
0

date --> your date column

df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(lambda x: x.strftime('%Y-%m'))
abdulsaboor
  • 678
  • 5
  • 10