1

I have two data frames built using pandas with more than 13 columns each.

  • In df1,one of the columns is company_name_x.
  • In df2, one of the columns is company_name_y.

Both columns in their respective frames contain plenty of company names which are strings. As output, I want to display the matching companies only if at least initial part (say 50%) of both company_name_x and company_name_y matches with each other. I am also calculating the fuzz ratio, which seems to be working fine. However, the combination of fuzz along with the above condition doesn't seem to work.

It gives indexing error:

Unalignable boolean Series key provided

Below is the code I am using -

df4 = df3[df3.Fuzz>85][df3.company_name_mod_x[0:len(df3.company_name_mod_x)/2] == 
         df3.company_name_mod_y[0:len(df3.company_name_mod_y)/2]]

df3 is the frame which has the top fuzz ratio for each possible pair of df1 and df2.

Output should match companies which has fuzz > 85 (works fine) and at least the first half of both companies should match (which isn't working)

JNYRanger
  • 6,829
  • 12
  • 53
  • 81
ComplexData
  • 1,091
  • 4
  • 19
  • 36
  • Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation. [on topic](http://stackoverflow.com/help/on-topic) and [how to ask](http://stackoverflow.com/help/how-to-ask) apply here. StackOverflow is not a coding or tutorial service. – Prune Jun 17 '16 at 22:22
  • You'll also have to clarify the problem. For instance, given "ABRA Co" and "Abracadabra Landscape Design", do these match? Over half of the first one matches the start of the second, but those 4 characters are far less than half of that second name. Does the capitalization affect the match? How about " Co." at the end? That might affect matching "ABC Corporation" with "ABC Incorporated". – Prune Jun 17 '16 at 22:25
  • "ABRA Co" and "Abracadabra Landscape Design" definitely should not match. For matching at least half of both the strings (complete strings including suffixes like Co.) should match. Capitalization should not affect the matching. – ComplexData Jun 17 '16 at 22:28
  • Perhaps http://stackoverflow.com/questions/13636848/is-it-possible-to-do-fuzzy-match-merge-with-python-pandas applies? – mdxs Jun 17 '16 at 23:03
  • I have edited the question for better understanding! Hope it helps. – ComplexData Jun 20 '16 at 18:10

0 Answers0