I have two data frames built using pandas with more than 13 columns each.
- In
df1
,one of the columns iscompany_name_x
. - In
df2
, one of the columns iscompany_name_y
.
Both columns in their respective frames contain plenty of company names which are strings. As output, I want to display the matching companies only if at least initial part (say 50%) of both company_name_x
and company_name_y
matches with each other. I am also calculating the fuzz ratio, which seems to be working fine. However, the combination of fuzz along with the above condition doesn't seem to work.
It gives indexing error:
Unalignable boolean Series key provided
Below is the code I am using -
df4 = df3[df3.Fuzz>85][df3.company_name_mod_x[0:len(df3.company_name_mod_x)/2] ==
df3.company_name_mod_y[0:len(df3.company_name_mod_y)/2]]
df3
is the frame which has the top fuzz ratio for each possible pair of df1
and df2
.
Output should match companies which has fuzz > 85
(works fine) and at least the first half of both companies should match (which isn't working)