0

I have a data frame like this:

   year       drug_name  avg_number_of_ingredients
0  2019     NEXIUM I.V.                          8
1  2016         ZOLADEX                         10
2  2017        PRILOSEC                         59
3  2017  BYDUREON BCise                         24
4  2019        Lynparza                         28

And I need to group drug names and mean number of ingredients by year like this:

   year     drug_name avg_number_of_ingredients
0  2019  drug a,b,c..     mean value for column
1  2018  drug a,b,c..     mean value for column
2  2017  drug a,b,c..     mean value for column

If I do df.groupby('year'), I lose drug names. How can I do it?

Jaroslav Bezděk
  • 6,967
  • 6
  • 29
  • 46
dirusali
  • 3
  • 4
  • 4
    `df.groupby('year',as_index=False).agg({'drug_name':','.join,'avg_number_of_ingredients':'mean'})` ? – anky Nov 07 '19 at 18:27

1 Answers1

-1

Let me show you the solution on the simple example. First, I make the same data frame as you have:

>>> df = pd.DataFrame(
    [
        {'year': 2019, 'drug_name': 'NEXIUM I.V.', 'avg_number_of_ingredients': 8},
        {'year': 2016, 'drug_name': 'ZOLADEX', 'avg_number_of_ingredients': 10},
        {'year': 2017, 'drug_name': 'PRILOSEC', 'avg_number_of_ingredients': 59},
        {'year': 2017, 'drug_name': 'BYDUREON BCise', 'avg_number_of_ingredients': 24},
        {'year': 2019, 'drug_name': 'Lynparza', 'avg_number_of_ingredients': 28},
    ]
)
>>> print(df)
   year       drug_name  avg_number_of_ingredients
0  2019     NEXIUM I.V.                          8
1  2016         ZOLADEX                         10
2  2017        PRILOSEC                         59
3  2017  BYDUREON BCise                         24
4  2019        Lynparza                         28

Now, I make a df_grouped, which still consists of information about drugs name.

>>> df_grouped = df.groupby('year', as_index=False).agg({'drug_name': ', '.join, 'avg_number_of_ingredients': 'mean'})
>>> print(df_grouped)
   year                 drug_name  avg_number_of_ingredients
0  2016                   ZOLADEX                       10.0
1  2017  PRILOSEC, BYDUREON BCise                       41.5
2  2019     NEXIUM I.V., Lynparza                       18.0
Jaroslav Bezděk
  • 6,967
  • 6
  • 29
  • 46