-1

Using Pandas I am trying to do group by for multiple columns and then fill the pandas dataframe where a person name is not present

For Example this is my Dataframe enter image description here

V1  V2  V3  PN

1   10  20  A 

2   10  21  A

3   10  20  C

I have a unique person name list = ['A','B','C','D','E']

Expected Outcome:- enter image description here

V1  V2  V3  PN

1   10  20  A

1   10  20  B

1   10  20  C

1   10  20  D

1   10  20  E

2   10  21  A

2   10  21  B

2   10  21  C

2   10  21  D

2   10  21  E

3   10  20  A

3   10  20  B

3   10  20  C

3   10  20  D

3   10  20  E

I was thinking about trying group by pandas statement but it didnt work out

Derek O
  • 16,770
  • 4
  • 24
  • 43

1 Answers1

0

Try this, using pd.MultiIndex with reindex to create additional rows:

import pandas as pd

df = pd.DataFrame({'Version 1':[1,2,3],
                   'Version 2':[10,10,10],
                   'Version 3':[20,21,20],
                   'Person Name':'A A C'.split(' ')})

p_list = [*'ABCDE']

df.set_index(['Version 1', 'Person Name'])\
  .reindex(pd.MultiIndex.from_product([df['Version 1'].unique(), p_list],
           names=['Version 1', 'Person Name']))\
  .groupby(level=0, group_keys=False).apply(lambda x: x.ffill().bfill())\
  .reset_index()

Output:

    Version 1 Person Name  Version 2  Version 3
0           1           A       10.0       20.0
1           1           B       10.0       20.0
2           1           C       10.0       20.0
3           1           D       10.0       20.0
4           1           E       10.0       20.0
5           2           A       10.0       21.0
6           2           B       10.0       21.0
7           2           C       10.0       21.0
8           2           D       10.0       21.0
9           2           E       10.0       21.0
10          3           A       10.0       20.0
11          3           B       10.0       20.0
12          3           C       10.0       20.0
13          3           D       10.0       20.0
14          3           E       10.0       20.0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Thankyou so much. This works as intended. Can you tell me your thought process about how you came to the answer?. I am learning so it might be useful – Parth Rathod Nov 22 '22 at 19:53
  • Sure. You can use reindex to duplicated lines in a dataframe. And, I noticed you were trying to create all "combinations" of Version 1 and Person Name, here, we can use pd.MultiIndex.from_product to create these combinations of two columns. Lastly, to fill the NaNs in a group we can use groupby to select small subsets of the dataframe to apply forwards and backwards fills. – Scott Boston Nov 22 '22 at 19:57
  • 1
    That really helps. All i wanted was combination and thats where I was not able to figure out how to do. Anyway thankyou so much @Scott Boston – Parth Rathod Nov 22 '22 at 20:07