Add rows to a df for IDs that don't have two groups in another column in Python

Question

I have a large dataset with about 50 columns, but the columns of concern are listed below:

ID	category	price	code
11	A	10	1
12	B	5	22
12	A	9	26
25	B	13	17
26	A	8	19
26	B	11	6

As you can see, some of my IDs have both categories A and B and some don’t. In my output table I want each ID to have both categories A and B and “blank” cells for missing prices and codes in the newly added rows. My desired output looks like this:

ID	category	price	code
11	A	10	1
11	B
12	A	9	26
12	B	5	22
25	A
25	B	13	17
26	A	8	19
26	B	11	6

Any ideas would be appreciated.

Markdown tables are nice to look at but difficult to copy. Could you paste the result of applying `.head().to_dict()` to your table? — Ben Grossmann, Feb 17 '22 at 02:08
{'ID': {0: 13, 1: 22, 2: 22, 3: 32, 4: 33}, 'Category': {0: 4020, 1: 4020, 2: 4020, 3: 4020, 4: 4020}, 'Price': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2}, 'Code': {0: 14, 1: 39, 2: 1, 3: 53, 4: 1}, 'Ranking: {0: 14.0, 1: 15.0, 2: 1.0, 3: 31.0, 4: 1.0}, . . . . — Dina, Feb 17 '22 at 02:24
Using `df.set_index(['ID','category']).reindex(pd.MultiIndex.from_tuples(itertools.product(df.ID, df.category)))` as in [this post](https://stackoverflow.com/a/53872957/2476977) gets close to what you want, but it results in repeated rows for some reason I don't understand — Ben Grossmann, Feb 17 '22 at 02:29
I ran the code but it says "None of ['Category'] are in the columns"! — Dina, Feb 17 '22 at 02:46
I suspect that's a capitalization issue. I've edited my answer so that the script is self contained and can run with a copy and paste. — Ben Grossmann, Feb 17 '22 at 03:17

Ben Grossmann · Answer 1 · 2022-02-17T03:16:47.980

As in this post, you could do the following.

import itertools
import pandas as pd

data = {'ID': {0: 11, 1: 12, 2: 12, 3: 25, 4: 26, 5: 26},
    'Category': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'A', 5: 'B'},
    'Price': {0: 10, 1: 5, 2: 9, 3: 13, 4: 8, 5: 11},
    'Code': {0: 1, 1: 22, 2: 26, 3: 17, 4: 19, 5: 6}}
df = pd.DataFrame(data)
df.set_index(['ID','Category']).reindex(
    pd.MultiIndex.from_tuples(
        itertools.product(df.ID.unique(), df.Category.unique())))

The result:

                price  code
11  A          10      1   
    B             NaN   NaN
12  A          9       26  
    B          5       22  
25  A             NaN   NaN
    B          13      17  
26  A          8       19  
    B          11      6

Add rows to a df for IDs that don't have two groups in another column in Python

1 Answers1