If pandas merge finds several matches, write values rows into one field

Question

I had no real good idea how to formulate a good header here.

The situation is that I have two data frames I want to merge:

df1 = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'ID'])
df2 = pd.DataFrame([[3, 2], [3, 3], [4, 6]], columns=['ID', 'values'])

so I do a:

pd.merge(df1, df2, on="ID", how="left")

which results in:

   A   ID    values
0  1   2     NaN
1  1   3     2.0
2  1   3     3.0
3  4   6     NaN

What I would like though is that any combination of A and ID only appear once. If there were several ones, like in the example above, it should take the respective values and merge them into a list(?) of values. So the result should look like this:

   A   ID    values
0  1   2     NaN
1  1   3     2.0, 3.0
2  4   6     NaN

I do not have the slightest idea how to approach this.

score 3 · Answer 1 · answered Apr 30 '18 at 12:11

3

Once you've got your merged dataframe, you can groupby columns A and ID and then simply apply list to your values column to aggregate the results into a list for each group:

import pandas as pd

df1 = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'ID'])
df2 = pd.DataFrame([[3, 2], [3, 3], [4, 6]], columns=['ID', 'values'])

merged = pd.merge(df1, df2, on="ID", how="left") \
           .groupby(['A', 'ID'])['values'] \
           .apply(list) \
           .reset_index()

print(merged)

prints:

   A  ID      values
0  1   2       [nan]
1  1   3  [2.0, 3.0]
2  4   6       [nan]

answered Apr 30 '18 at 12:11

asongtoruin

9,794
3
36
47

I was just about to accept your answer, when @Ami Savory posted his answer, which comes even closer to what I needed. – SLglider Apr 30 '18 at 15:14
This is a very good answer. @SLglider I really don't mind if you accept this one instead. – Ami Tavory Apr 30 '18 at 15:28

Ami Tavory · Accepted Answer · 2018-04-30T15:28:18.863

2

You could use

merged = pd.merge(df1, df2, on="ID", how="left") \
           .groupby(['A', 'ID'])['values'] \
           .apply(list) \
           .reset_index()

as in asongtoruin fine answer, but you might want to consider the case of only None as special (due to the merge), in which case you can use

    >>> df['values'].groupby([df.A, df.ID]).apply(lambda g: [] if g.isnull().all() else list(g)).reset_index()
    A   ID  values
0   1   2   []
1   1   3   [2.0, 3.0]
2   4   6   []

edited Apr 30 '18 at 15:28

answered Apr 30 '18 at 12:25

Ami Tavory

74,578
11
141
185

Exactly what I needed. In addition to your answer, I also found this link helpful: [link](https://stackoverflow.com/questions/23794082/pandas-groupby-and-join-lists) – SLglider Apr 30 '18 at 15:13
1

This is a neat answer, and definitely extends mine nicely! I think you might be able to replace the `len(g) == 1 and np.isnan(list(g)[0])` with `g.isnull().all()` to make it a more `pandas`-y answer – asongtoruin Apr 30 '18 at 15:20

If pandas merge finds several matches, write values rows into one field

2 Answers2