Read in all csv files from a directory using Python

Question

I hope this is not trivial but I am wondering the following:

If I have a specific folder with n csv files, how could I iteratively read all of them, one at a time, and perform some calculations on their values?

For a single file, for example, I do something like this and perform some calculations on the x array:

import csv
import os

directoryPath=raw_input('Directory path for native csv file: ') 
csvfile = numpy.genfromtxt(directoryPath, delimiter=",")
x=csvfile[:,2] #Creates the array that will undergo a set of calculations

I know that I can check how many csv files there are in a given folder (check here):

import glob
for files in glob.glob("*.csv"):
    print files

But I failed to figure out how to possibly nest the numpy.genfromtxt() function in a for loop, so that I read in all the csv files of a directory that it is up to me to specify.

EDIT

The folder I have only has jpg and csv files. The latter are named eventX.csv, where X ranges from 1 to 50. The for loop I am referring to should therefore consider the file names the way they are.

score 40 · Accepted Answer · answered Nov 03 '15 at 16:20

40

That's how I'd do it:

import os

directory = os.path.join("c:\\","path")
for root,dirs,files in os.walk(directory):
    for file in files:
       if file.endswith(".csv"):
           f=open(file, 'r')
           #  perform calculation
           f.close()

answered Nov 03 '15 at 16:20

Can the `f.close()` line be placed right after I define `x=csvfile[:,2]`? The number `2` is just exemplificative. – FaCoffee Nov 03 '15 at 16:24
And, if I may add, is your code checking for all `csv` files in ALL folders within `directory`? – FaCoffee Nov 03 '15 at 16:26
2

as a note, the recommended way of opening files is `with open(file) as file` this has the advantage of closing automatically when out of scope – Busturdust Nov 03 '15 at 16:31
2

@FrancescoCastellani for your first question: you can do this but you won't be able do do any other operation on the file. As for the second, it only lists all files in a directory. If you want all files from all folders within a direcory you can store each folder from directory in a list and then get the .csv from each folder at a time. – Nov 03 '15 at 16:33
Could you please explain this line, os.path.join("c:\\","path") – Prasanta Bandyopadhyay Nov 03 '21 at 09:06

score 17 · Answer 2 · answered Jan 19 '20 at 11:21

Using pandas and glob as the base packages

import glob
import pandas as pd

glued_data = pd.DataFrame()
for file_name in glob.glob(directoryPath+'*.csv'):
    x = pd.read_csv(file_name, low_memory=False)
    glued_data = pd.concat([glued_data,x],axis=0)

plonser · Answer 3 · 2015-11-03T18:12:53.817

9

I think you look for something like this

import glob

for file_name in glob.glob(directoryPath+'*.csv'):
    x = np.genfromtxt(file_name,delimiter=',')[:,2]
    # do your calculations

Edit

If you want to get all csv files from a folder (including subfolder) you could use subprocess instead of glob (note that this code only works on linux systems)

import subprocess
file_list = subprocess.check_output(['find',directoryPath,'-name','*.csv']).split('\n')[:-1]

for i,file_name in enumerate(file_list):
    x = np.genfromtxt(file_name,delimiter=',')[:,2]
    # do your calculations
    # now you can use i as an index

It first searches the folder and sub-folders for all file_names using the find command from the shell and applies your calculations afterwards.

edited Nov 03 '15 at 18:12

answered Nov 03 '15 at 16:34

plonser

3,323
2
18
22

Well I very like this handy and short solution but I tested it and it did not yield what I wanted. I created a new empty folder, placed three `csv` files in it named `file_1.csv`, `file_2.csv`, and `file_3.csv`, each of which has the value `1`, `2`, and `3` as unique value (without header). Then I created `a=numpy.zeros(3)` to fill it with those values but I get `a=([0,0,0])`. In the `for` loop, the new values of `a` are assigned like this: `a[file_name]=numpy.genfromtxt(file_name,delimiter=',')[0,0]`. Instead of `a=([1,2,3])` I get `a=([0,0,0])`. – FaCoffee Nov 03 '15 at 16:56
1

Hmm ... it worked for my simple examples ... let me check what could go wrong ... – plonser Nov 03 '15 at 16:57
@FrancescoCastellani : `file_name` is a string in my code ... what do you mean with `a[file_name]`? `a[...]` requires an integer ... aren't there any errors? – plonser Nov 03 '15 at 17:00
No, no errors. I was attempting to use `file_name` as a counter variable since it carries the exact number of files (and of values) of this test case. I made this up just to test your hint. If we can't use `file_name` as counter, what could we use? Should we add a nested loop to add a counter ranging 1 to 3? – FaCoffee Nov 03 '15 at 17:03
@FrancescoCastellani : what happens when you use `print x` instead of `a[file_name]=...` ? Do you obtain `1 2 3` ? – plonser Nov 03 '15 at 17:03
It does not print anything. I've probably got it: as you say, I was attempting to use `file_name` as a counter variable since it carries the exact number of files (and of values) of this test case. I made this up just to test your hint. If we can't use `file_name` as integer, what could we use? Should we add a nested loop to add a counter ranging 1 to 3? – FaCoffee Nov 03 '15 at 17:04
Then I assume that `file_list` is empty. `np.genfromtxt` raises an error when a file does not exist and if it reads the file there should be some output. Thus, the problem is related to the `find` command. Do you use the full path for the directory path? – plonser Nov 03 '15 at 17:10
Yes. In my case that is `C:\Users\Francesco\Desktop\prova`. – FaCoffee Nov 03 '15 at 17:12
1

Oh, then the problem is that you use Windows because (as far as I know) the command `find` does not exist (or does not work) as I used it in my program. ... Hmm, let me see whether I can rewrite that part in order to work for you – plonser Nov 03 '15 at 17:16
@FrancescoCastellani : [Here](http://stackoverflow.com/a/1724723/4367286) you can find an algorithm which hopefully helps you build the `file_list`. Unfortunately I can not help you more because I am not working on Windows. – plonser Nov 03 '15 at 17:29

Ward · Answer 4 · 2015-11-03T16:48:43.940

2

According to the documentation of numpy.genfromtxt(), the first argument can be a

File, filename, or generator to read.

That would mean that you could write a generator that yields the lines of all the files like this:

def csv_merge_generator(pattern):
    for file in glob.glob(pattern):
        for line in file:
            yield line

# then using it like this

numpy.genfromtxt(csv_merge_generator('*.csv'))

should work. (I do not have numpy installed, so cannot test easily)

edited Nov 03 '15 at 16:48

answered Nov 03 '15 at 16:35

Ward

2,802
1
23
38

Would your last line be nested in a `for` loop? – FaCoffee Nov 03 '15 at 16:47
1

nonono, it is passed in the generator, and as such gets all the files – Ward Nov 03 '15 at 16:48

score 2 · Answer 5 · answered Jul 21 '21 at 16:48

Here's a more succinct way to do this, given some path = "/path/to/dir/".

import glob
import pandas as pd

pd.concat([pd.read_csv(f) for f in glob.glob(path+'*.csv')])

Then you can apply your calculation to the whole dataset, or, if you want to apply it one by one:

pd.concat([process(pd.read_csv(f)) for f in glob.glob(path+'*.csv')])

score 2 · Answer 6 · answered Jan 16 '23 at 16:35

2

Another answer using list comprehension:

from os import listdir
files= [f for f in listdir("./") if f.endswith(".csv")]

answered Jan 16 '23 at 16:35

Luis Felipe

148
9

1

I like this one with no extra dependency! – Alex Zubkov Aug 18 '23 at 17:11

score 1 · Answer 7 · answered Mar 11 '22 at 07:35

The function below will return a dictionary containing a dataframe for each .csv file in the folder within your defined path.

import pandas as pd
import glob
import os
import ntpath

def panda_read_csv(path):
    pd_csv_dict = {}
    csv_files = glob.glob(os.path.join(path, "*.csv"))
    for csv_file in csv_files:
        file_name = ntpath.basename(csv_file)
        pd_csv_dict['pd_' + file_name] = pd.read_csv(csv_file, sep=";", encoding='mac_roman')
    locals().update(pd_csv_dict)
    return pd_csv_dict

score 1 · Answer 8 · answered May 24 '22 at 19:37

You can use pathlib glob functionality to list all .csv in a path, and pandas to read them. Then it's only a matter of applying whatever function you want (which, if systematic, can also be done within the list comprehension)

import pands as pd
from pathlib import Path

path2csv = Path("/your/path/")
csvlist = path2csv.glob("*.csv")
csvs = [pd.read_csv(g) for g in csvlist ]

score -1 · Answer 9 · edited Mar 31 '22 at 12:17

-1

You need to import the glob library and then use it like following:

import  glob
path='C:\\Users\\Admin\\PycharmProjects\\db_conection_screenshot\\seclectors_absent_images'
filenames = glob.glob(path + "\*.png")
print(len(filenames))

edited Mar 31 '22 at 12:17

Tayyab Vohra

1,512
3
22
49

answered Mar 31 '22 at 10:09

mayur sarvankar

9
1

1

This merely seems to repeat information from several previous answers without adding any explanation or new value. – tripleee Mar 31 '22 at 10:12
As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 31 '22 at 11:14

Read in all csv files from a directory using Python

9 Answers9

Linked

Related