Fill in values based on a differnt datafram values in pandas

Question

I have the following dataframe:

df1 = pd.DataFrame({'ID': ['foo', 'foo','bar','foo', 'baz', 'foo'],'value': [1, 2, 3, 5, 4, 3, 1, 2, 3]})

df2 = pd.DataFrame({'ID': ['foo', 'bar', 'baz', 'foo'],'age': [10, 21, 32, 15]})

I would like to create a new column in DF1 called age, and take the values from df2, that match on 'ID'. I would like for those values to be duplicated (instead of nan), when 'ID' value appears more than once in df1.

I tried a merge of df1 and df2, but they produce NaNs instead of duplicates.

Tha Pandas 101 does not contain an answer for this problem.

I took a careful look through this but could not find an answer there. Could you point me to the correct location within that post? — arkadiy, Jul 29 '20 at 19:21

score 0 · Answer 1 · answered Jul 29 '20 at 19:50

0

I think you need outer join:

df = pd.merge(df1, df2, on='ID', how='outer')
print(df)

    ID  value  age
0  foo      1   10
1  foo      1   15
2  foo      2   10
3  foo      2   15
4  foo      5   10
5  foo      5   15
6  foo      3   10
7  foo      3   15
8  bar      3   21
9  baz      4   32

answered Jul 29 '20 at 19:50

YOLO

20,181
5
20
40

Thanks- I tried this, but for some reason the resulting DF gets filled with NaNs (instead of duplicates), since df1 is longer. – arkadiy Jul 29 '20 at 19:53
I used your sample data and it gives this output. Probably you can do more test case where you see this code is failing so I can check. – YOLO Jul 29 '20 at 19:54
Thank you for trying this. I just tried the same- copied and pasted, and am getting the following error: ValueError: arrays must all be same length. For some reason it is having issues handling different lengths of the columns. – arkadiy Jul 29 '20 at 20:03
because your `df1` has extra values. try this: `df1 = pd.DataFrame({'ID': ['foo', 'foo','bar','foo', 'baz', 'foo'],'value': [1, 2, 3, 5, 4, 3]}) df2 = pd.DataFrame({'ID': ['foo', 'bar', 'baz', 'foo'],'age': [10, 21, 32, 15]})` – YOLO Jul 29 '20 at 20:26

Fill in values based on a differnt datafram values in pandas

1 Answers1