How to read each file from a folder and create seperate data frames for each file?

Question

I am trying to get my code to read a folder containing various files. I was hoping to get Jupyter to read each file within that folder and create separate dataframes by taking the names of the files as the dataframe names.

So far I have the code:

import glob

path = r'C:\Users\SemR\Documents\Jupyter\Submissions' 
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0, usecols=['Date', 'Usage'])
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

This code concatenates the data however I want separate data frames for each so that I can store the values separately. Is there something I can use instead?

Here are examples of how the CSV files look:

These CSV files are in the same folder so I was hoping that when I run my code, new dataframes would be created with the same name as the CSV file name.

Thank you.

Not sure what you're trying to achieve, but this would likely be a better approach: https://stackoverflow.com/questions/50066635/how-to-concatenate-all-csvs-in-a-directory-adding-csv-name-as-a-column-with-pyt — IanS, Jul 09 '19 at 09:49
In each CSV file from the folder, the data contain dates and values per date (i.e.). I am trying to create a function which only takes the values column from each file and then loop it so I can work out the average for each df separately. Make sense? — R Sem, Jul 09 '19 at 09:49
Better to have just one large dataframe, store the file name as a column (see my previous link) and then calculate a per-file average using `groupby`. — IanS, Jul 09 '19 at 09:51
Rather than using different variable names for each dataframe, I suggest you use a single dictionary, the keys would be the dataframe names. — Martin Evans, Jul 09 '19 at 10:09
@MartinEvans Would " d = {os.path.basename(f).split('.')[0]:pd.read_csv(f) for f in glob.glob('*.csv') if "test" in f} " be what you are talking about? — R Sem, Jul 09 '19 at 10:11
Looks good to me, you could then access the dataframes using `d['file1']` or whatever your filenames are. You could also use `os.path.splitext()` — Martin Evans, Jul 09 '19 at 10:49

score 4 · Accepted Answer · answered Jul 09 '19 at 10:55

4

A better approach to using different variables for each of your dataframes would be to load each dataframe into a dictionary.

The basename of each filename could be extracted using a combination of os.path.basename() and os.path.splitext().

For example:

d = {os.path.splitext(os.path.basename(f))[0] : pd.read_csv(f) for f in glob.glob('*test*.csv')}

Also, using *test* would avoid the need for the if in the comprehension.

answered Jul 09 '19 at 10:55

Martin Evans

45,791
17
81
97

This is perfect! Thank you very much for your help :) – R Sem Jul 09 '19 at 12:36
Done :) Also, how can I remove NA's from a dictionary within the line of code you provided? – R Sem Jul 09 '19 at 12:41
1

If the NA is inside the dataframe then you would have to convert this to a `for` loop and test for it before adding. You could also test if the CSV fails to load. If you try to add this to the list comprehension, you might have to use `read_csv()` twice per file which is not ideal. – Martin Evans Jul 09 '19 at 12:45

score 1 · Answer 2 · answered Jul 09 '19 at 12:00

From the question what I can suggest is that you have got different DataFrames stored in the list.

import glob

path = r'C:\Users\SemR\Documents\Jupyter\Submissions' 
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0, usecols=['Date', 'Usage'])
    li.append(df)

for dataframes in li:
    """ For getting the mean of a specific column """
    df.loc[:,"Usage"].mean()

You can use df.dropna() to remove NaN – Bibyutatsu Jul 09 '19 at 12:44 — Bibyutatsu, Jul 09 '19 at 12:44

How to read each file from a folder and create seperate data frames for each file?

2 Answers2

Linked