0

I have this simple task. I have two excel files with company names. The files are pretty large(about 170k rows). My task is to take company name in one file and print all identical ones from another. So for example we have Table A:

id company name
0 Born SA
1 ToBeBorn SA
2 Ice SA
3 Icey SA

and table B:

id company name
0 Born SA
1 ToBeBornInEU SA
2 IceCake SA
3 Icey SA

And I want to find identical names from Table A in Table B. So the return will be like this: Born S.A. Icey S.A.
This is simple task. My code looks like this:

import pandas as pd
clients_a = pd.read_excel("excel_file_number1")
clients_b = pd.read_excel("excel_file_number2")
for clientA in clients_a["Clients"]:
 for clientB in clients_b["Clients"]:
  if clientA.lower() == clientB.lower():
   print(clientA)

I use lower because the same company may have different entry. In table A it may be Ice SA but in table B It's ICE SA, but It's still the same company. My question is, how can I make this faster/more efficient ? Not gonna lie it takes a lot of time, but I don't have any idea how can I sped it up. it's a simple task so There must be a way, but I don't know how. Any help would be greatly appreciated!

smac89
  • 39,374
  • 15
  • 132
  • 179
neekitit
  • 61
  • 6

2 Answers2

0

Use set(). If you fill two sets, you can find the intersection.

Ed Behn
  • 450
  • 2
  • 10
0

This should be faster although I haven't tested it:

clients_a["lower"] = clients_a["Clients"].lower()
clients_b["lower"] = clients_b["Clients"].lower()

clients_a["lower"].apply(lambda x: (clients_b["lower"] == x).any())
nnsk
  • 93
  • 5