0

I am trying to merge to Pandas DataFrame as below:

import numpy as np
import pandas as pd

df = pd.DataFrame({"vals": np.random.RandomState(31).randint(-30, 30, size=15), 
                   "grps": np.random.RandomState(31).choice(["A", "B"], 15)})

mean = df.groupby('grps').mean().rename(columns={'vals':'mean'})
df.merge(df, mean, left_on='grps', right_index=True)

But I got this error:

...
/opt/conda/lib/python3.7/site-packages/pandas/core/reshape/merge.py in _maybe_coerce_merge_keys(self)
   1144                     inferred_right in string_types and inferred_left not in string_types
   1145                 ):
-> 1146                     raise ValueError(msg)
   1147 
   1148             # datetimelikes must match exactly

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

I think the types of the merge key in these two DataFrame are the same. Why did this error happen?

2020-05-02 update:

@wwnde pointed out the error may be related to different index sizes of the two DataFrame. But how to explain the following working example:

df1 = pd.DataFrame({'employee': ['Bob', 'Jake', 'Lisa', 'Sue'],
                    'group': ['Accounting', 'Engineering', 'Engineering', 'HR']})
df2 = pd.DataFrame({'group': ['Accounting', 'Accounting',
                              'Engineering', 'Engineering', 'HR', 'HR'],
                    'skills': ['math', 'spreadsheets', 'coding', 'linux',
                               'spreadsheets', 'organization']})
print(df1)
print(df2)
pd.merge(df1, df2)

I reset_index for mean DataFrame, this is a new error:

mean = df.groupby('grps').mean().rename(columns={'vals':'mean'}).reset_index()
df.merge(df, mean, on='grps')

Error:

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
   1477     def __nonzero__(self):
   1478         raise ValueError(
-> 1479             f"The truth value of a {type(self).__name__} is ambiguous. "
   1480             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1481         )

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
kk17
  • 601
  • 6
  • 14

2 Answers2

1

IICU

This could be the best way out

df['mean']=df.groupby('grps')['vals'].transform('mean')
df

enter image description here

wwnde
  • 26,119
  • 6
  • 18
  • 32
  • Thanks, @wwnde. But, can you explain why my way did not work? – kk17 May 01 '20 at 01:24
  • If you review the documentation, you can only merge on either index if the dataframes have the same number of indixes. in your case, mean has only two whereas df has more. There are many other ways to merge if you were to go your way (concat, map etc etc) However why go that way if you can go straight to the answer. Does this help. Happy to assist further – wwnde May 01 '20 at 02:00
  • Hi, @wwnde, I found an example for merge two datafrome with different sizes. You can see my question update. – kk17 May 02 '20 at 03:59
0

Instead of:

df.merge(df, mean, left_on='grps', right_index=True)

I think you need:

pd.merge(df, mean, left_on='grps', right_index=True)

or

df.merge(mean, left_on='grps', right_index=True)
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Karthik V
  • 1,867
  • 1
  • 16
  • 23