-1

I have the following two dataframes:

df1 = pd.DataFrame([["blala Amazon", '02/30/2017', 'Amazon'], ["blala Amazon", '04/28/2017', 'Amazon'], ['blabla Netflix', '06/28/2017', 'Netflix']], columns=['text', 'date', 'keyword'])

df2 = pd.DataFrame([['01/28/2017', '3.4', '10.2'], ['02/30/2017', '3.7', '10.5'], ['03/28/2017', '6.0', '10.9']], columns=['dates', 'ReturnOnAssets.1', 'ReturnOnAssets.2'])

(perhaps it's clearer in the screenshots here: https://i.stack.imgur.com/cTcwN.jpg)

The df2 is much larger than shown here - it contains columns for 100 companies. So for example, for the 10th company, the column names are: ReturnOnAssets.10, etc.

I have created a dictionary which maps the company names to the column names:

stocks = {'Microsoft':'','Apple' :'1', 'Amazon':'2', 'Facebook':'3',
          'Berkshire Hathaway':'4', 'Johnson & Johnson':'5',
          'JPMorgan' :'6', 'Alphabet': '7'} 

and so on.

Now, what I am trying to achieve is adding a column "ReturnOnAssets" from d2 to d1, but for a specific company and for a specific date. So looking at df1, the first tweet (i.e. "text") contains a keyword "Amazon" and it was posted on 04/28/2017. I now need to go to df2 to the relevant column name for Amazon (i.e. "ReturnOnAssets.2") and fetch the value for the specified date.

So what I expect looks like this:

df1 = pd.DataFrame([["blala Amazon", '02/30/2017', 'Amazon', **'10.5'**], ["blala Amazon", '04/28/2017', 'Amazon', 'x'], ["blabla Netflix', '06/28/2017', 'Netflix', 'x']], columns=['text', 'date', 'keyword', 'ReturnOnAssets'])

By x I mean values which where not included in the example df1 and df2.

I am fairly new to pandas and I can't wrap my head around it. I tried:

keyword = df1['keyword']
txt = 'ReturnOnAssets.'+ stocks[keyword]
df1['ReturnOnAssets'] = df2[txt]

But I don't know how to fetch the relevant date, and also this gives me an error: "Series' objects are mutable, thus they cannot be hashed", which probably comes from the fact that I cannot just add a whole column of keywords to the text string.

I don't know how to achieve the operation I need to do, so I would appreciate help.

  • if you include the example dataframe in text instead of pictures, it makes it easier for others to use – Leo Jun 28 '19 at 15:11
  • Please see [how to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and create a [mcve] with sample input and output data – G. Anderson Jun 28 '19 at 15:15
  • Ok, I'm new here so thank you, I'm having a look. – arctic.queenolina Jun 28 '19 at 15:19

2 Answers2

0

Next time please give us a correct test data, I modified your dates and dictionary for match the first and second column (netflix and amazon values). This code will work if and only if all dates from df1 are in df2 (Note that in df1 the column name is date and in df2 the column name is dates)

df1 = pd.DataFrame([["blala Amazon", '02/30/2017', 'Amazon'], ["blala Amazon", '04/28/2017', 'Amazon'], ['blabla Netflix', '02/30/2017', 'Netflix']], columns=['text', 'date', 'keyword'])

df2 = pd.DataFrame([['04/28/2017', '3.4', '10.2'], ['02/30/2017', '3.7', '10.5'], ['03/28/2017', '6.0', '10.9']], columns=['dates', 'ReturnOnAssets.1', 'ReturnOnAssets.2'])

stocks = {'Microsoft':'','Apple' :'5', 'Amazon':'2', 'Facebook':'3',
          'Berkshire Hathaway':'4', 'Netflix':'1',
          'JPMorgan' :'6', 'Alphabet': '7'} 

df1["ReturnOnAssets"]= [  df2["ReturnOnAssets." +   stocks[  df1[ "keyword" ][ index ]  ]     ][  df2.index[ df2["dates"]  == df1["date"][index]   ][0]   ]       for index in range(len(df1))  ] 

df1
L F
  • 548
  • 1
  • 7
  • 22
  • I am trying this: df2["new_column"] = df1['ReturnOnAssets.' + d[key] for key in d] but it's giving me syntax error – arctic.queenolina Jun 28 '19 at 15:49
  • its becasuse df1 has no the column `df1['ReturnOnAssets.'` , what you have to do is `[ str(df1["ReturnOnAssets"]+d[key]) for key in d ]` – L F Jun 28 '19 at 15:59
  • That is giving me "ValueError: Length of values does not match length of index". Besides, I'm not sure if that's correct. What I want to do is to read each keyword from d1, check the date in d1, and then match it with the ReturnOnAssets for that exact company on the exact same date. – arctic.queenolina Jun 28 '19 at 19:27
  • @arctic.queenolina I have improved my answer, it should work now with your data. – L F Jun 28 '19 at 21:38
0

It can probably be shortened and you can add if statements to deal with when there are missing values.

import pandas as pd 
import numpy as np 

df1 = pd.DataFrame([["blala Amazon", '05/28/2017', 'Amazon'], ["blala Facebook", '04/28/2017', 'Facebook'], ['blabla Netflix', '06/28/2017', 'Netflix']], columns=['text', 'dates', 'keyword'])
df1
df2 = pd.DataFrame([['06/28/2017', '3.4', '10.2'], ['05/28/2017', '3.7', '10.5'], ['04/28/2017', '6.0', '10.9']], columns=['dates', 'ReturnOnAsset.1', 'ReturnOnAsset.2'])
#creating myself a bigger df2 to cover all the way to netflix
for i in range (9): 
  df2[('ReturnOnAsset.' + str(i))]=np.random.randint(1, 1000, df1.shape[0])

stocks = {'Microsoft':'0','Apple' :'1', 'Amazon':'2', 'Facebook':'3',
          'Berkshire Hathaway':'4', 'Johnson & Johnson':'5',
          'JPMorgan' :'6', 'Alphabet': '7', 'Netflix': '8'} 

#new col where to store values
df1['ReturnOnAsset']=np.nan

for index, row in df1.iterrows():  
  colname=('ReturnOnAsset.' + stocks[row['keyword']] )
  df1['ReturnOnAsset'][index]=df2.loc[df2['dates'] ==row['dates'] , colname]

Leo
  • 1,176
  • 1
  • 13
  • 33