1

I want to load multiple CSV files into one dataframe. Each CSV contains stock data with 6 columns ( 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume' ) . I managed to load the CSV files, but I'm missing the column name ( each ticker, from CSV ).

sp500 =  os.listdir(os.path.splitext(os.getcwd()+'/spy500')[0])

combined = pd.concat([pd.read_csv('spy500/'+i, parse_dates=True, index_col='Date') for i in sp500], axis=1)

output:

Open    | High  |Low    |Close| Adj Close   |Volume|    Open|   High|   Low Close|  Adj Close   |Volume

desire output:

AAPL                                            | GOOG                  
Open |High  |Low    |Close  |Adj Close  |Volume |Open   |High   |Low    |Close  |Adj Close  |Volume

the output is correct, the only thing I need is to add a multi level column: 5986 rows × 3030 columns

Fede
  • 1,656
  • 4
  • 24
  • 42

1 Answers1

2

Use dictionary comprehension:

comp = {i.split('.')[0]: 
        pd.read_csv('spy500/'+i, parse_dates=True, index_col='Date') for i in sp500}
combined = pd.concat(comp, axis=1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Can you suggest a solution like this that will do the same job but in parallel computation? I know that it is possible for example to use read_csv("/*.csv") which will read those files into a single dataframe using multiple cores (speaking about Dask specifically). – Ben Dec 30 '22 at 11:27