1

I have some CSV files with exactly the same structure of stock quotes (timeframe is one day):

date,open,high,low,close
2001-10-15 00:00:00 UTC,56.11,59.8,55.0,57.9
2001-10-22 00:00:00 UTC,57.9,63.63,56.88,62.18

I want to merge them all into one DataFrame with only close price columns for each stock. The problem is different files has different history depth (they started from different dates in different years). I want to align them all by date in one DataFrame. I'm trying to run the following code, but I have nonsense in the resulted df:

files = ['FB', 'MSFT', 'GM', 'IBM']
stock_d = {}
for file in files: #reading all files into one dictionary:
    stock_d[file] = pd.read_csv(file + '.csv', parse_dates=['date'])
    
date_column = pd.Series() #the column with all dates from all CSV
for stock in stock_d:
    date_column = date_column.append(stock_d[stock]['date'])
date_column = date_column.drop_duplicates().sort_values(ignore_index=True) #keeping only unique values, then sorting by date
df = pd.DataFrame(date_column, columns=['date']) #creating final DataFrame

for stock in stock_d:
    stock_df = stock_d[stock] #this is one of CSV files, for example FB.csv
    df[stock] = [stock_df.iloc[stock_df.index[stock_df['date'] == date]]['close'] for date in date_column] #for each date in date_column adding close price to resulting DF, or should be None if date not found 
    
print(df.tail()) #something strange here - Series objects in every column

The idea is first to extract all dates from each file, then to distribute close prices among according columns and dates. But obviously I'm doing something wrong. Can you help me please?

chernish2
  • 113
  • 1
  • 9
  • 2
    "I have nonsense in the resulted df" is not very useful. Please show samples of some input dataframes, te current output, and the expected output, to make a [mcve]. See also: [How to make good, reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – G. Anderson Oct 29 '20 at 19:58

1 Answers1

1

If I understand you correctly, what you are looking for is the pivot operation:

files = ['FB', 'MSFT', 'GM', 'IBM']
df = [] # this is a list, not a dictionary
for file in files:
    # You only care about date and closing price
    # so only keep those 2 columns to save memory
    tmp = pd.read_csv(file + '.csv', parse_dates=['date'], usecols=['date', 'close']).assign(symbol=file)
    df.append(tmp)

# A single `concat` is faster then sequential `append`s
df = pd.concat(df).pivot(index='date', columns='symbol')
Code Different
  • 90,614
  • 16
  • 144
  • 163
  • Thank you! But how can I plot the resulting df with matplotlib then? I want to have a legend with labels according each column (FB MSFT etc). In df.columns I have a certain MutliIndex object, and in plt.plot(df, label=df.columns) it draws those multindexes instead of stock names. – chernish2 Oct 30 '20 at 10:22