1

I have a data frame let's call it df it has various columns I am interested in values for a particular year and count the column for that year so I use the following

df =

    ClientI OrderStartDate  FunderCode
0   U27     2017-05-22      H2
1   U28     2017-05-22      H2
2   U28     2018-09-27      H3
3   U28     2019-03-19      H4
4   U29     2017-05-22      H2
HCPL2017 = df_c[df_c['OrderStartDate'].dt.year == 2017]['FunderCode'].value_counts().reset_index().rename(columns={'index': 'FCode', 'FunderCode': 'Count_2017'})
    FunderCode  Count_2017
0   HCPL2   431
1   HCPL4   188
2   HCPL3   59
3   HCPL1   2

Similarly I did for the year 2018 , 2019 etc

Then I merge using

pd.merge(pd.merge(HCPL2017, HCPL2018,on = "FunderCode"), HCPL2019,on ="FunderCode")

In the end, I got a merged table

    FunderCode  Count_2017  Count_2018  Count_2019
3   H1        2              85            207  
0   H2       431             591           724  
2   H3       59             205            372  
1   H4      188             201            282  

Is there a way to make the process quicker to get the information from the original data frame? Like I have many more years, wondering if I can filter all of them and get the count in a few steps?

GGT
  • 163
  • 8
  • Thanks a lot, I will remember that, I will post only code only then – GGT Oct 15 '22 at 07:16
  • Great, thank you. Could you additionally provide some demo data? Like `pd.DataFrame({"Orderstatedate:["2022-01-01", "...", "..."], "FCode"[1,2,3,...]}`. I'm sure there is a faster way, but without a [MWE](https://stackoverflow.com/a/20159305/12242625) it's hard to help. Just maybe one row per year. – Marco_CH Oct 15 '22 at 07:21
  • 1
    Sure I gave an example, I hope it's ok, please let me know if you want more examples. – GGT Oct 15 '22 at 07:36

1 Answers1

0

You can use pandas pivot_table:

df["year"] = pd.to_datetime(df["OrderStartDate"]).dt.year

(
    pd.pivot_table(df, values="ClientI", index="FunderCode", columns="year", aggfunc="count")
    .add_prefix("Count_")
    .reset_index()
    .rename_axis(None, axis=1)
    #.fillna(0) if you want to replace the nan's
)

Output:

+----+--------------+--------------+--------------+--------------+
|    | FunderCode   |   Count_2017 |   Count_2018 |   Count_2019 |
|----+--------------+--------------+--------------+--------------|
|  0 | H2           |            3 |          nan |          nan |
|  1 | H3           |          nan |            1 |          nan |
|  2 | H4           |          nan |          nan |            1 |
+----+--------------+--------------+--------------+--------------+
Marco_CH
  • 3,243
  • 8
  • 25