2

I had no real good idea how to formulate a good header here.

The situation is that I have two data frames I want to merge:

df1 = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'ID'])
df2 = pd.DataFrame([[3, 2], [3, 3], [4, 6]], columns=['ID', 'values'])

so I do a:

pd.merge(df1, df2, on="ID", how="left")

which results in:

   A   ID    values
0  1   2     NaN
1  1   3     2.0
2  1   3     3.0
3  4   6     NaN

What I would like though is that any combination of A and ID only appear once. If there were several ones, like in the example above, it should take the respective values and merge them into a list(?) of values. So the result should look like this:

   A   ID    values
0  1   2     NaN
1  1   3     2.0, 3.0
2  4   6     NaN

I do not have the slightest idea how to approach this.

SLglider
  • 267
  • 1
  • 4
  • 16

2 Answers2

3

Once you've got your merged dataframe, you can groupby columns A and ID and then simply apply list to your values column to aggregate the results into a list for each group:

import pandas as pd

df1 = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'ID'])
df2 = pd.DataFrame([[3, 2], [3, 3], [4, 6]], columns=['ID', 'values'])

merged = pd.merge(df1, df2, on="ID", how="left") \
           .groupby(['A', 'ID'])['values'] \
           .apply(list) \
           .reset_index()

print(merged)

prints:

   A  ID      values
0  1   2       [nan]
1  1   3  [2.0, 3.0]
2  4   6       [nan]
asongtoruin
  • 9,794
  • 3
  • 36
  • 47
2

You could use

merged = pd.merge(df1, df2, on="ID", how="left") \
           .groupby(['A', 'ID'])['values'] \
           .apply(list) \
           .reset_index()

as in asongtoruin fine answer, but you might want to consider the case of only None as special (due to the merge), in which case you can use

    >>> df['values'].groupby([df.A, df.ID]).apply(lambda g: [] if g.isnull().all() else list(g)).reset_index()
    A   ID  values
0   1   2   []
1   1   3   [2.0, 3.0]
2   4   6   []
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • Exactly what I needed. In addition to your answer, I also found this link helpful: [link](https://stackoverflow.com/questions/23794082/pandas-groupby-and-join-lists) – SLglider Apr 30 '18 at 15:13
  • 1
    This is a neat answer, and definitely extends mine nicely! I think you might be able to replace the `len(g) == 1 and np.isnan(list(g)[0])` with `g.isnull().all()` to make it a more `pandas`-y answer – asongtoruin Apr 30 '18 at 15:20