I tried to answer this question by a group-level merging. The below is a slightly modified version of the same question, but I need the output by a group-level merging.
Here are the input dataframes:
df = pd.DataFrame({ "group":[1,1,1 ,2,2],
"cat": ['a', 'b', 'c', 'a', 'c'] ,
"value": range(5),
"value2": np.array(range(5))* 2})
df
cat group value value2
a 1 0 0
b 1 1 2
c 1 2 4
a 2 3 6
c 2 4 8
categories = ['a', 'b', 'c', 'd']
categories = pd.DataFrame(['a', 'b', 'c', 'd'], columns=['cat'])
print(categories)
cat
0 a
1 b
2 c
3 d
Here's the expected output:
cat group value value2
a 1 0 0
b 1 1 2
c 1 2 4
d NA NA NA
a 2 3 6
c 2 4 8
b NA NA NA
d NA NA NA
Question:
I can achieve what I want by a for loop. Is there a pandas way to do that though?
(I need to perform an outer join between categories
and each group of the groupby result of df.groupby('group')
)
grouped = df.groupby('group')
merged_list = []
for g in grouped:
merged = pd.merge(categories, g[1], how = 'outer', on='cat')
merged_list.append(merged)
out = pd.concat(merged_list)