1

So when I tried to put different size dataframe, it always results in an error.

import pandas as pd
from pandas import Series,DataFrame
import numpy as np

# For reading stock data from yahoo
import pandas_datareader as web

# For time stamps
from datetime import datetime

closing_df = web.DataReader(['AAPL','GOOG','MSFT','AMZN'],'yahoo',start,end)['Adj Close']
#when I do this, it is fine since the size are the same whereas
closing_df = web.DataReader(['AAPL','GOOG','MSFT','AMZN','BTC-USD'],'yahoo',start,end)['Adj Close']
#I always get this error
#ValueError: Index contains duplicate entries, cannot reshape

I tried to have two dataframe, one for the tech company and one for the BTC-USD But when I use join,concat or merge, none it seems to work I want to get all the joint date for both dataset and put it together e.g. if both dataframe has 2010-11-30 then it will be in the dataframe but if only one dataframe contain that date then it will ignore or do not put it in the joint dataframe. Many Thank

ALollz
  • 57,915
  • 7
  • 66
  • 89

1 Answers1

1

One workaround is the following

tech = web.DataReader(['AAPL','GOOG','MSFT','AMZN'],'yahoo', start, end)['Adj Close']
btc = web.DataReader('BTC-USD','yahoo', start, end)['Adj Close']

result_df = pd.merge(tech, btc, left_index=True, right_index=True).rename(columns={'Adj Close': 'BTC'})

However by checking single DataFrames it looks like that while tech only has financial days, BTC also has weekends and holidays, thus they retrieve different dates overall. With the above join you will lose BTC data. Maybe it can be better to outer join and then fill down values:

result_df = pd.merge(tech, btc, left_index=True, right_index=True,
                     how='outer').rename(columns={'Adj Close': 'BTC'})
result_df.fillna(method='ffill', inplace=True)
FBruzzesi
  • 6,385
  • 3
  • 15
  • 37
  • Why do you set left and right index to true and why do we have to rename it? Why can’t we can just merge it together and an would be auto fill by itself, just like example for merging two dataframe – Supawich zaO Nov 27 '19 at 19:55
  • In the merge (join) you need to declare which columns of each dataframe we are merging on. In this case we are using their index, therefore we set left and right index parameters as True. Regarding the auto fill, as in SQL, no join (inner, outer, left, right) will autofill missing data, since it doesn't know which rule it should use. Therefore we are filling with whatever we think it is more reasonable (I just suggested one among many possible). Please do not hesitate to ask if not clear. – FBruzzesi Nov 27 '19 at 20:34
  • Regarding column renaming, the pandas series with BTC is called Adj Close, since we are retrieving only such column and one ticker. Therefore the dataframe generated after the join does not have a BTC named column – FBruzzesi Nov 27 '19 at 20:38