0

I have two Pandas dataframes:

print(df_a)
   ID  irrelevant_value
0   1  1.2
1   1  2.3
2   1  0.9
3   1  1.1
4   2  2.7
5   2  3.1
6   3  1.3
7   3  0.2
8   3  2.3
...

and

    ID  add_these_values_to_the_same_ID
0   1   100
1   2   120
2   3   90
...

I would like to combine them, such that - desired result:

print(df_a)

    ID  irrelevant_value  add_these_values_to_the_same_ID
0   1   1.2               100
1   1   2.3               100
2   1   0.9               100
3   1   1.1               100
4   2   2.7               120
5   2   3.1               120
6   3   1.3               90
7   3   0.2               90
8   3   2.3               90
...

How can this be accomplished?

I have been struggling with df_a.groupby(["ID"]), but cannot find a way forward.

Rerun
  • 35
  • 4
  • I would look up how to use joins or merges in pandas. See the following links: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html – Jeff Coldplume Apr 13 '23 at 16:08
  • 1
    Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Marcelo Paco Apr 13 '23 at 16:18
  • Yes! Thank you, kindly. This seems to be exactly what I needed: df_a = df_a.join(df_b.set_index("ID"), on="ID") – Rerun Apr 13 '23 at 16:48

2 Answers2

2

The groupby function is not needed here; instead, just use merge. As it says in those docs,

If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.

And both of your dataframes contain ID. Therefore, you can merge the dataframes with this code (this is just one way to do so with one method, but it does work):

import pandas as pd

# Example dataframes - these might be defined elsewhere in your situation
data_a = {'ID': [1, 1, 1, 1, 2, 2, 3, 3, 3],
          'irrelevant_value': [1.2, 2.3, 0.9, 1.1, 2.7, 3.1, 1.3, 0.2, 2.3]}
df_a = pd.DataFrame(data_a)

data_b = {'ID': [1, 2, 3],
          'add_these_values_to_the_same_ID': [100, 120, 90]}
df_b = pd.DataFrame(data_b)

# Merge dataframes on the 'ID' column
result = df_a.merge(df_b, on='ID')

print(result)

The really important line there is:

result = df_a.merge(df_b, on='ID')

And this will output:

   ID  irrelevant_value  add_these_values_to_the_same_ID
0   1               1.2                             100
1   1               2.3                             100
2   1               0.9                             100
3   1               1.1                             100
4   2               2.7                             120
5   2               3.1                             120
6   3               1.3                              90
7   3               0.2                              90
8   3               2.3                              90

This should work, but if your data does not work with this method, you can take a look at how:

how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default inner Type of merge to be performed.

  • left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
  • right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
  • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
  • inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
  • cross: creates the cartesian product from both frames, preserves the order of the left keys.

outer might work if the code above does not.

  • 1
    Beautiful! Your suggestions seems to work well, and looks nice and clean! Thank you for taking the time to explain. – Rerun Apr 13 '23 at 16:51
0

Try: dataframe.merge

df_a.merge(other_df, left_on = 'ID', right_on = 'ID')