1

I got the following dataframe,df, with the report_date as the index:

report_date sales
2021-06-30 130000
2021-06-30 140000
2021-07-31 125000
2021-07-31 110000
2021-08-31 110000
2021-08-31 110000

And I want to extract 2021-06 and 2021-08 only. How could I achieve this goal?

I can just extract two individual dataframe by df['2021-06'] and df['2021-08']

Hang
  • 197
  • 1
  • 11

2 Answers2

2

For match values is possible convert DatetimeIndex to months periods and test membership by Index.isin:

#if necessary
#df.index = pd.to_datetime(df.index)

df3 = df[df.index.to_period('m').isin(pd.to_datetime(['2021-06','2021-08']).to_period('m'))]
print (df3)
              sales
report_date        
2021-06-30   130000
2021-06-30   140000
2021-08-31   110000
2021-08-31   110000

Or:

df3 = df[df.index.to_period('m').isin(pd.PeriodIndex(['2021-06','2021-08'], freq='m'))]
print (df3)
              sales
report_date        
2021-06-30   130000
2021-06-30   140000
2021-08-31   110000
2021-08-31   110000

Or convert values to strings YYYY-MM and test by strings in list:

df3 = df[df.index.strftime('%Y-%m').isin(['2021-06','2021-08'])]
print (df3)
              sales
report_date        
2021-06-30   130000
2021-06-30   140000
2021-08-31   110000
2021-08-31   110000
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • but I would like combine into one dataframe but not two separate dataframe – Hang May 16 '22 at 05:09
  • KeyError: "None of [Index(['2021-06', '2021-08'], dtype='object', name='report_date')] are in the [index]" – Hang May 16 '22 at 05:10
  • @Hang - what is `print (df.index)` ? – jezrael May 16 '22 at 05:11
  • DatetimeIndex(['2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', ... '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31'], dtype='datetime64[ns]', name='report_date', length=5146, freq=None) – Hang May 16 '22 at 05:12
  • @Hang - answer was edited. – jezrael May 16 '22 at 05:15
0

Just do

df = df.reset_index()
new_df = df[(df["report_date"] == "2021-06") | (df["report_date"] == "2021-07")]

or using numpy

new_df = df.iloc[np.where((df.index == "2021-06") | (df.index == "2021-07"))[0], :]

Output -

report_date sales
0 2021-06 130000
1 2021-06 140000
2 2021-07 125000
3 2021-07 110000
Zero
  • 1,800
  • 1
  • 5
  • 16
  • df.index == '2021-06', df.index == '2021-07' will be False, so nothing can be shown – Hang May 16 '22 at 05:05
  • Is there a way that not to reset the index – Hang May 16 '22 at 05:05
  • @Hang Yeah the second method does that. I have tried it. – Zero May 16 '22 at 05:07
  • It returns a empty dataframe to me... – Hang May 16 '22 at 05:11
  • @Hang try, `df.loc[["2021-06", "2021-07"], :]` – Zero May 16 '22 at 05:15
  • KeyError: "None of [Index(['2021-06', '2021-07'], dtype='object', name='report_date')] are in the [index]" – Hang May 16 '22 at 05:15
  • print (df.index) = DatetimeIndex(['2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', '2021-01-31', ... '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31', '2021-12-31'], dtype='datetime64[ns]', name='report_date', length=5146, freq=None) – Hang May 16 '22 at 05:15
  • @Hang can you just convert the dataframe into a dictionary and add it in the question? And try `df.loc[["2021-06-30", "2021-07-31"], :]` for the new indices. – Zero May 16 '22 at 05:16
  • I am afraid it is not possible to do that, sorry about that. You may create a dataframe similar to mine will be okay for testing. – Hang May 16 '22 at 05:18
  • @Hang I am using the dataframe from the question and the three methods so work. – Zero May 16 '22 at 05:18
  • Huh, that's strange. However, @jezrael has already solved, you can take a look. – Hang May 16 '22 at 05:20
  • @Hang no problems! – Zero May 16 '22 at 05:20