0

I have a DataFrame as follows:

df:
date          name    issuer    rate
2022-01-01    SPY     A         0.3
2022-01-01    SPY     B         0.2
2022-01-01    MSFT    A         0.2
2022-01-01    MSFT    B         0.1
2022-01-02    SPY     A         0.2
2022-01-02    SPY     B         0.1
2022-01-02    SPY     C         0.2
2022-01-02    SPY     D         0.2
2022-01-02    MSFT    A         0.2
2022-01-02    MSFT    B         0.4
2022-01-02    MSFT    C         0.5
2022-01-02    MSFT    D         0.4

I want to get group the DataFrame (which contains duplicate entries) and add a median column so that it looks like this:

df1:
                        A     B     C     D      median
date          name      
2022-01-01    SPY       0.3   0.2                0.25
              MSFT      0.2   0.1                0.15
2022-01-02    SPY       0.2   0.1  0.2    0.2    0.2
              MSFT      0.2   0.4  0.5    0.4    0.4

I have tried using the groupby function but it gives me an error due to duplicate entries.

What's the best way to do this?

EDIT: I have tried using the pivot function and got an error concerning duplicates. groupby function works but I don't know how to set issuer as column names.

MathMan 99
  • 665
  • 1
  • 7
  • 19

1 Answers1

2

You can pivot the dataframe first, then calculate the median along the row axis:

>>> out = df.pivot_table('rate', ['date', 'name'], ['issuer'])
>>> out['median'] = out.median(axis=1)

OUTPUT:

issuer             A    B    C    D  median
date       name                            
2022-01-01 MSFT  0.2  0.1  NaN  NaN    0.15
           SPY   0.3  0.2  NaN  NaN    0.25
2022-01-02 MSFT  0.2  0.4  0.5  0.4    0.40
           SPY   0.2  0.1  0.2  0.2    0.20
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45