I want to do a special merge, but I'm not sure how to do it in pandas.
I have the following df1:
CompanyName Country Ticker Revenue ....................
0 Apple Inc. US
1 Microsoft US MSFT 1235
2 Sony US
3 DBS SG D05 5523
4 Razer HK 0700.HK 2231
5 General Electric US GE 2131
6 Check Point L
7 Huawei CN
And I have this frame tickersdf:
CompanyName Ticker Country
0 Apple Inc. AAPL.MX MX
1. Apple Inc. APC.DE DE
2. Apple Inc. APC.F F
3. Microsoft MSFT US
4. Sony SNE US
5 Razer 0700.HK HK
6. General Electric GE US
7. Sony 6758.T T
8. Microsoft MSFT.BA BA
9 General Motors. GM US
10. Check Point CHKP US
Firstly, I want to merge on the CompanyName and Country as the keys.
This I would do with a simple command like,
df1 = pd.merge(df1,tickersdf,on=['CompanyName','Country'], how='left')
So Sony which is missing SNE will be filled up first.
CompanyName Country Ticker Revenue ....................
0 Apple Inc. US
1 Microsoft US MSFT 1235
2 Sony US SNE
3 DBS SG D05 5523
4 Razer HK 0700.HK 2231
5 General Electric US GE 2131
6 Check Point L
7 Huawei CN
As you can see, df1 has missing values and I want to get the values from tickersdf.
However, tickersdf has multiple ticker values for the same CompanyName.
If you look at Apple Inc., it doesn't have the US ticker available, but it has alternatives.
I would like to merge the following data frames like so that I get the following:
CompanyName Country Ticker Revenue ....................
0. Apple Inc. US hasalt
1 Apple Inc. MX AAPL.MX
2. Apple Inc. DE APC.DE
3. Apple Inc. F APC.F
4 Microsoft US MSFT 1235
5. Microsoft BA MSFT.BA
6 Sony US SNE
7. Sony T 6758.T
8 DBS SG D05 5523
9 Razer HK 0700.HK 2231
10 General Electric US GE 2131
11. Check Point L hasalt
12. Check Point US CHKP
13. Huawei CN
I don't want to touch the rows that contain a Ticker already in df1.
For companies like Apple Inc. and Check Point in this example, the ticker isn't available in US or L, but, it has alternatives. So I want the ticker portion to say 'hasalt' or be left blank and I want the alternatives to be copied over into df1.
Microsoft and Sony already have MSFT and SNE filled in df1, however there are alternatives available, so I want those to be checked and copied over to df1 as well.
If a company name doesn't exist in df1, I don't want it to be added on, like General Motors wasn't in df1, so I don't want it to be brought over.
Companies like Huawei have no matches at all in the tickersdf, so I want it to be left blank.
Can this be done in pandas? If so, how?
I was thinking of doing this
df1['Ticker'] = df1['Ticker'].fillna(df1['CompanyName'].map(tickersdf.set_index('CompanyName')['Ticker']))
But tickersdf has duplicates, then I also considered merging or joining, but I don't think it can be done that way because the entries that are in tickersdf that are not in df1 will appear in df1.
I know I can filter out the alternatives using this
tickersdf[(tickersdf['CompanyName'].str.contains('Check Point'))]
But how do I copy the rows over into df1 based on my condition?
I think it would look something like,
if df1['CompanyName'] is in tickersdf['CompanyName'],
then set df1['Ticker'] = 'hasalt'
followed by tickersdf[(tickersdf['CompanyName'].str.contains(df1['CompanyName']))]
and copying all the alternatives over to df1.
else ignore
What's the right way to do this? Could it be done with some kind of special ffill?