2

I have a long list of csv files that I want to read as dataframes and name them by their file name. For example, I want to read in the file status.csv and assign its dataframe the name status. Is there a way I can efficiently do this using Pandas?

Looking at this, I still have to write the name of each csv in my loop. I want to avoid that.

Looking at this, that allows me to read multiple csv into one dataframe instead of many.

Gaurav Bansal
  • 5,221
  • 14
  • 45
  • 91
  • You can get all csv under current directory using `os.listdir(".")`, combined with `os.path.basename` to parse file name. – knh190 Mar 19 '19 at 16:29
  • Are you open to using `dask`? You could read in all the separate dataframes and have them contained in one data structure, i.e., a dask dataframe, partitioned by their original file name. Docs are [here](http://docs.dask.org/en/latest/dataframe.html) – jeschwar Mar 19 '19 at 16:36

2 Answers2

8

You can list all csv under a directory using os.listdir(dirname) and combine it with os.path.basename to parse the file name.

import os

# current directory csv files
csvs = [x for x in os.listdir('.') if x.endswith('.csv')]
# stats.csv -> stats
fns = [os.path.splitext(os.path.basename(x))[0] for x in csvs]

d = {}
for i in range(len(fns)):
    d[fns[i]] = pd.read_csv(csvs[i])
knh190
  • 2,744
  • 1
  • 16
  • 30
1

you could create a dictionary of DataFrames:

d = {}  # dictionary that will hold them 

for file_name in list_of_csvs:  # loop over files

   # read csv into a dataframe and add it to dict with file_name as it key
   d[file_name] = pd.read_csv(file_name)


lorenzori
  • 737
  • 7
  • 23