Trying to group by using multiple columns

Question

Using Pandas I am trying to do group by for multiple columns and then fill the pandas dataframe where a person name is not present

For Example this is my Dataframe enter image description here

V1  V2  V3  PN

1   10  20  A 

2   10  21  A

3   10  20  C

I have a unique person name list = ['A','B','C','D','E']

Expected Outcome:- enter image description here

V1  V2  V3  PN

1   10  20  A

1   10  20  B

1   10  20  C

1   10  20  D

1   10  20  E

2   10  21  A

2   10  21  B

2   10  21  C

2   10  21  D

2   10  21  E

3   10  20  A

3   10  20  B

3   10  20  C

3   10  20  D

3   10  20  E

I was thinking about trying group by pandas statement but it didnt work out

can you post your data (input and expected output) as formatted text? see [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Derek O, Nov 22 '22 at 17:19
Please provide enough code so others can better understand or reproduce the problem. — Сергей Кох, Nov 22 '22 at 19:16

score 0 · Accepted Answer · answered Nov 22 '22 at 18:40

Try this, using pd.MultiIndex with reindex to create additional rows:

import pandas as pd

df = pd.DataFrame({'Version 1':[1,2,3],
                   'Version 2':[10,10,10],
                   'Version 3':[20,21,20],
                   'Person Name':'A A C'.split(' ')})

p_list = [*'ABCDE']

df.set_index(['Version 1', 'Person Name'])\
  .reindex(pd.MultiIndex.from_product([df['Version 1'].unique(), p_list],
           names=['Version 1', 'Person Name']))\
  .groupby(level=0, group_keys=False).apply(lambda x: x.ffill().bfill())\
  .reset_index()

Output:

    Version 1 Person Name  Version 2  Version 3
0           1           A       10.0       20.0
1           1           B       10.0       20.0
2           1           C       10.0       20.0
3           1           D       10.0       20.0
4           1           E       10.0       20.0
5           2           A       10.0       21.0
6           2           B       10.0       21.0
7           2           C       10.0       21.0
8           2           D       10.0       21.0
9           2           E       10.0       21.0
10          3           A       10.0       20.0
11          3           B       10.0       20.0
12          3           C       10.0       20.0
13          3           D       10.0       20.0
14          3           E       10.0       20.0

Thankyou so much. This works as intended. Can you tell me your thought process about how you came to the answer?. I am learning so it might be useful — Parth Rathod, Nov 22 '22 at 19:53
Sure. You can use reindex to duplicated lines in a dataframe. And, I noticed you were trying to create all "combinations" of Version 1 and Person Name, here, we can use pd.MultiIndex.from_product to create these combinations of two columns. Lastly, to fill the NaNs in a group we can use groupby to select small subsets of the dataframe to apply forwards and backwards fills. — Scott Boston, Nov 22 '22 at 19:57
That really helps. All i wanted was combination and thats where I was not able to figure out how to do. Anyway thankyou so much @Scott Boston — Parth Rathod, Nov 22 '22 at 20:07

Trying to group by using multiple columns

1 Answers1