I use pandas for this task. I imported two excel files:
- File A contains company names and some other stuff.
- File B contains company names, industries a company belongs to indicated by a number between 1 to 14, and some other stuff I don't need.
I want to compare these two files and find a match, and return the corresponding industry number, then make a new column in File A to show what industry a company belongs to by the number.
What I have done so far is to extract two columns I need from each file and make them into a list. Then to keep the association, I put them into a dictionary. Then I used for loop and nested for loop to find a match. But I don't know how to go further from here. And also one problem came up, which is the way companies are listed in the two files are somewhat same but not exactly. So I want to allow it to be a match if more than 4 sequences of characters in the names match.
com = A["comany"].tolist()
indu = A['industry'].tolist()
sponsor = B["sponsor"].tolist()
event = B["Event"].tolist()
dicA = dict(zip(com, indu))
dicB = dict(zip(sponsor, event))
import re
for spnsr in dicB:
for company, industry in dicA.items():
m = re.search(spnsr, company)
if m:
m = m.group()
print(m, industry)