2

I am fairly new to working with pandas. I have a dataframe with individual entries like this:

dfImport:

id date_created date_closed
0 01-07-2020
1 02-09-2020 10-09-2020
2 07-03-2019 02-09-2020

I would like to filter it in a way, that I get the total number of created and closed objects (count id's) grouped by Year and Quarter and Month like this:

dfInOut:

Year Qrt month number_created number_closed
2019 1 March 1 0
2020 3 July 1 0
September 1 2

I guess I'd have to use some combination of crosstab or group_by, but I tried out alot of ideas and already did research on the problem, but I can't seem to figure out a way. I guess it's an issue of understanding. Thanks in advance!

Arne_22
  • 57
  • 7

1 Answers1

1

Use DataFrame.melt with crosstab:

df['date_created'] = pd.to_datetime(df['date_created'], dayfirst=True)
df['date_closed'] = pd.to_datetime(df['date_closed'], dayfirst=True)

df1 = df.melt(value_vars=['date_created','date_closed']).dropna()
df = (pd.crosstab([df1['value'].dt.year.rename('Year'), 
                   df1['value'].dt.quarter.rename('Qrt'), 
                  df1['value'].dt.month.rename('Month')], df1['variable'])
         [['date_created','date_closed']])

print (df)
variable        date_created  date_closed
Year Qrt Month                           
2019 1   3                 1            0
2020 3   7                 1            0
         9                 1            2

df = df.rename_axis(None, axis=1).reset_index()
print (df)
   Year  Qrt  Month  date_created  date_closed
0  2019    1      3             1            0
1  2020    3      7             1            0
2  2020    3      9             1            2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you so much! it worked exactly as I wanted it to. Though I don't exactly understand how the crosstab function works in correlation to that .melt() function. The `df1 = df.melt(value_vars=['date_created','date_closed']).dropna()` basically just puts all entries of objects that have created dates (not nan) and closed (not nan) together and assings them either the name of "created" or "closed" right? – Arne_22 Nov 11 '21 at 12:50
  • @Arne_22 - sorry, wrong link, I think this - https://stackoverflow.com/questions/68961796/how-do-i-melt-a-pandas-dataframe/68961797#68961797 – jezrael Nov 11 '21 at 12:59
  • 1
    This is a really good explanation with a lot of examples. Seems to be a really handy function in some special cases like this. Thank you again for helping :) – Arne_22 Nov 11 '21 at 13:51
  • @Arne_22 - you are welcome! happy coding and nice day. – jezrael Nov 11 '21 at 13:52