Working on what I think should be a simple merge but I can't find quite the right solution.
I have two dfs of Fortune 500 companies. df1 is 2 columns (Company and CIK), 117 rows long. df2 is 2 columns (Rank, and Company) and 225 rows long. The company order is different between the dfs.
I want a dataframe that matches the company names (even if the company names aren't spelled 100% the same), preserves the order of df2, and fills in the CIK values from df1 to df2, and fills in Na in the remaining values (108 will be NA).
This merge only gives 52 values (probably the ones that are exactly matching).
df3 = df2.merge(df1, on="Company")
This merge fills a lot of NaN values where things don't exactly match:
df3 = df2.merge(df1, on="Company", how = "outer")
df1:
{'Company': {0: 'Amazon',
1: 'Verizon Communications',
2: 'AT&T',
3: 'Alphabet',
4: 'Intel',
5: 'Facebook',
6: 'Exxon Mobil',
7: 'Microsoft',
8: 'Duke Energy',
9: 'Comcast',
10: 'Exelon'},
'CIK': {0: '0001018724',
1: '0000732712',
2: '0001018724',
3: '0001652044',
4: '0000050863',
5: '0001326801',
6: '0000034088',
7: '0000789019',
8: '0001326160',
9: '0001166691',
10: '0001109357'}}
df2:
{'Rank': {0: 1,
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 9,
9: 10,
10: 11},
'Company': {0: 'Walmart',
1: 'Amazon',
2: 'Apple',
3: 'CVS Health',
4: 'United Health Group',
5: 'Berkshire Hathaway',
6: 'McKesson',
7: 'AmerisourceBergen',
8: 'Alphabet',
9: 'Exxon Mobil',
10: 'AT&T'}}
I have also tried various .concat and .join but have had similar problems.
Would a loop be more appropriate?
Thank you!
UPDATE: using a left join, the CIK row fills in with NaN (picture attached). Also, should mention that the CIK column is dtype=str because dtype=int64 eliminates the leading zeros and I need the zeros for the API call.