-1

I have the following dataframe:

df1 = pd.DataFrame({'ID': ['foo', 'foo','bar','foo', 'baz', 'foo'],'value': [1, 2, 3, 5, 4, 3, 1, 2, 3]})

df2 = pd.DataFrame({'ID': ['foo', 'bar', 'baz', 'foo'],'age': [10, 21, 32, 15]})

I would like to create a new column in DF1 called age, and take the values from df2, that match on 'ID'. I would like for those values to be duplicated (instead of nan), when 'ID' value appears more than once in df1.

I tried a merge of df1 and df2, but they produce NaNs instead of duplicates.

Tha Pandas 101 does not contain an answer for this problem.

arkadiy
  • 746
  • 1
  • 10
  • 26
  • I took a careful look through this but could not find an answer there. Could you point me to the correct location within that post? – arkadiy Jul 29 '20 at 19:21

1 Answers1

0

I think you need outer join:

df = pd.merge(df1, df2, on='ID', how='outer')
print(df)

    ID  value  age
0  foo      1   10
1  foo      1   15
2  foo      2   10
3  foo      2   15
4  foo      5   10
5  foo      5   15
6  foo      3   10
7  foo      3   15
8  bar      3   21
9  baz      4   32
YOLO
  • 20,181
  • 5
  • 20
  • 40
  • Thanks- I tried this, but for some reason the resulting DF gets filled with NaNs (instead of duplicates), since df1 is longer. – arkadiy Jul 29 '20 at 19:53
  • I used your sample data and it gives this output. Probably you can do more test case where you see this code is failing so I can check. – YOLO Jul 29 '20 at 19:54
  • Thank you for trying this. I just tried the same- copied and pasted, and am getting the following error: ValueError: arrays must all be same length. For some reason it is having issues handling different lengths of the columns. – arkadiy Jul 29 '20 at 20:03
  • because your `df1` has extra values. try this: `df1 = pd.DataFrame({'ID': ['foo', 'foo','bar','foo', 'baz', 'foo'],'value': [1, 2, 3, 5, 4, 3]}) df2 = pd.DataFrame({'ID': ['foo', 'bar', 'baz', 'foo'],'age': [10, 21, 32, 15]})` – YOLO Jul 29 '20 at 20:26