2

I have a data frame, you can have it by running:

import pandas as pd
from io import StringIO
    
df = """  
           case_id    scheduled_date        code
           1213       2021-08-17            1
           3444       2021-06-24            3
           4566       2021-07-20            5
          
    """
df= pd.read_csv(StringIO(df.strip()), sep='\s\s+', engine='python')

How can I change scheduled_date to only keep year and month? The output should be:

  case_id   scheduled_date  code
0   1213    2021-08         1
1   3444    2021-06         3
2   4566    2021-07         5
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
William
  • 3,724
  • 9
  • 43
  • 76

4 Answers4

1

Convert the date to datetime and access the month that way

df['month'] = pd.to_datetime(df['scheduled_date']).dt.to_period('M')

   case_id scheduled_date  code    month
0     1213     2021-08-17     1  2021-08
1     3444     2021-06-24     3  2021-06
2     4566     2021-07-20     5  2021-07

Note that the dtype with be period[M] and not an object using this method.

It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
1

You can use string parsing to drop the day of the month (I'm assuming you want strings since the days in the expected output are absent):

df["scheduled_date"].str.split("-").str[:2].str.join("-").astype(str)

This outputs:

   case_id scheduled_date  code
0     1213        2021-08     1
1     3444        2021-06     3
2     4566        2021-07     5
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
1

You can also try this:

df['scheduled_date'] = pd.to_datetime(df.scheduled_date, format='%Y-%m-%d').dt.strftime('%Y-%m')


   case_id scheduled_date  code
0     1213        2021-08     1
1     3444        2021-06     3
2     4566        2021-07     5
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
1

Firstly, convert your string column to datetime column. Later you can apply many different date operations.

For reference: I learnt the answer from this thread - Drop the year from "Year-month-date" format in a pandas dataframe

# converting to datetime: 
df['scheduled_date'] = pd.to_datetime(df['scheduled_date'])

# converting the datetime column to desired output
df['scheduled_date'] = df['scheduled_date'].dt.strftime('%y-%m ')

Sample Output: enter image description here