Given a company name ( non-standard suffixes and forms ), I want to create a new column in my df
with its stock ticker. I can pull the ticker based on another lookup / key table, but the forms of each company name are not 100% consistent between tables.
I have two datasets:
1. List of Names ( df
)
2. Mapping of ticker to names ( dfKey
)
The names of both companies are not always the same so I can't do df['Ticker']=np.where(df['companyName']==dfKey['companyName'],dfKey['Ticker'].NaN)
Even a solution, where I can get 70-90% correct is good enough ( My real dataset is thousands of companies and just some data is better than none; some will be impossible to decode like Salesforce from CRM ).
My Sample df
s:
import numpy as np
import pandas as pd
raw_data = {
'companyName1': ['General Electric','NVIDA Corporation', 'Blizzard', 'CRM', 'Google', 'Tesla']}
df = pd.DataFrame(raw_data , columns = ['companyName1'])
#dfKey.set_index('Code', inplace=True) #Set Code as Row Index
print(df)
raw_dataKey = {'Ticker': ['GE','NVID', 'ATVI', 'CRM', 'GOOGL', 'TSLA'],
'companyName2': ['General Electric Company','NVIDA Corp', 'Activision', 'SalesForce', 'Google', 'Tesla Inc']}
dfKey = pd.DataFrame(raw_dataKey , columns = ['Ticker', 'companyName2'])
#dfKey.set_index('Code', inplace=True) #Set Code as Row Index
print(dfKey)
Desired Output:
companyName1 Ticker
0 General Electric GE
1 NVIDA Corporation NVID
2 Activision Blizzard ATVI
3 CRM NaN
4 Google GOOG
5 Tesla TSLA
I've already tried some form of splitting each up and then comparing the first word ( which should be a good enough solution ) but I keep getting confused on how to handle lists within dataframes.
df['companyNameSplit'] = df['companyName'].str.split(' ')
I've also tried modifying a URL call by sticking in the company name to no avail just to see what I get ( à la Getting stock symbol from company name )
import urllib
url='http://d.yimg.com/autoc.finance.yahoo.com/autoc?query=Thomas%20Scott&callback=YAHOO.Finance.SymbolSuggest.ssCallback'
data = urllib.request.urlopen(url).read()
Any Other Ideas I'm Missing?