0

Given a company name ( non-standard suffixes and forms ), I want to create a new column in my df with its stock ticker. I can pull the ticker based on another lookup / key table, but the forms of each company name are not 100% consistent between tables.
I have two datasets:
1. List of Names ( df )
2. Mapping of ticker to names ( dfKey )

The names of both companies are not always the same so I can't do df['Ticker']=np.where(df['companyName']==dfKey['companyName'],dfKey['Ticker'].NaN)

Even a solution, where I can get 70-90% correct is good enough ( My real dataset is thousands of companies and just some data is better than none; some will be impossible to decode like Salesforce from CRM ).

My Sample dfs:

import numpy as np
import pandas as pd

raw_data = {       
    'companyName1': ['General Electric','NVIDA Corporation', 'Blizzard', 'CRM', 'Google', 'Tesla']} 
df = pd.DataFrame(raw_data , columns = ['companyName1'])
#dfKey.set_index('Code', inplace=True) #Set Code as Row Index
print(df)

raw_dataKey = {'Ticker': ['GE','NVID', 'ATVI', 'CRM', 'GOOGL', 'TSLA'],       
    'companyName2': ['General Electric Company','NVIDA Corp', 'Activision', 'SalesForce', 'Google', 'Tesla Inc']} 
dfKey = pd.DataFrame(raw_dataKey , columns = ['Ticker', 'companyName2'])
#dfKey.set_index('Code', inplace=True) #Set Code as Row Index
print(dfKey)

Desired Output:

          companyName1 Ticker
0     General Electric     GE
1    NVIDA Corporation   NVID
2  Activision Blizzard   ATVI
3                  CRM    NaN
4               Google   GOOG
5                Tesla   TSLA

I've already tried some form of splitting each up and then comparing the first word ( which should be a good enough solution ) but I keep getting confused on how to handle lists within dataframes.

df['companyNameSplit'] = df['companyName'].str.split(' ')

I've also tried modifying a URL call by sticking in the company name to no avail just to see what I get ( à la Getting stock symbol from company name )

import urllib
url='http://d.yimg.com/autoc.finance.yahoo.com/autoc?query=Thomas%20Scott&callback=YAHOO.Finance.SymbolSuggest.ssCallback'
data = urllib.request.urlopen(url).read()

Any Other Ideas I'm Missing?

user3666197
  • 1
  • 6
  • 50
  • 92
whs2k
  • 741
  • 2
  • 10
  • 19

2 Answers2

1

Here's how you can match on the first word of the company name:

In [36]: df['first_word'] = df.companyName1.str.split(' ').str[0]

In [37]: dfKey['first_word'] = dfKey.companyName2.str.split(' ').str[0]

In [38]: pd.merge(df, dfKey, on='first_word', how='outer')
Out[38]: 
        companyName1  first_word Ticker              companyName2
0   General Electric     General     GE  General Electric Company
1  NVIDA Corporation       NVIDA   NVID                NVIDA Corp
2           Blizzard    Blizzard    NaN                       NaN
3                CRM         CRM    NaN                       NaN
4             Google      Google  GOOGL                    Google
5              Tesla       Tesla   TSLA                 Tesla Inc
6                NaN  Activision   ATVI                Activision
7                NaN  SalesForce    CRM                SalesForce
Brian from QuantRocket
  • 5,268
  • 1
  • 21
  • 18
  • Thanks!! I wonder if there is a way to search and see if any part of string in column Name1 is in column Name2 throughout a pandas dataframe...Will investigate further. – whs2k Jul 15 '17 at 17:16
-1

Imagine I have to map following two companies

1.Instructions Business Machines 2.National Bank of New York City

Since from the result we can see first word International matches with most of tickers and it shows up.

Results : Just considered first few rows

Ticker  companyName1

0 NaN IBM

1 IBM International Business Machines

2 IFF International Business Machines

3 IGT International Business Machines

4 INSW International Business Machines

5 IP International Business Machines

6 NaN Google

7 NBHC National Bank of New York City

8 NFG National Bank of New York City

9 NGG National Bank of New York City

10 NHI National Bank of New York City