3

In one of my directory, I have multiple CSV files. I wanted to read the content of all the CSV file through a python code and print the data but till now I am not able to do so.

All the CSV files have the same number of columns and the same column names as well.

I know a way to list all the CSV files in the directory and iterate over them through "os" module and "for" loop.

for files in os.listdir("C:\\Users\\AmiteshSahay\\Desktop\\test_csv"):

Now use the "csv" module to read the files name

reader = csv.reader(files)

till here I expect the output to be the names of the CSV files. which happens to be sorted. for example, names are 1.csv, 2.csv so on. But the output is as below

<_csv.reader object at 0x0000019F97E0E730>
<_csv.reader object at 0x0000019F97E0E528>
<_csv.reader object at 0x0000019F97E0E730>
<_csv.reader object at 0x0000019F97E0E528>
<_csv.reader object at 0x0000019F97E0E730>
<_csv.reader object at 0x0000019F97E0E528>

if I add next() function after the csv.reader(), I get below output

['1']
['2']
['3']
['4']
['5']
['6']

This happens to be the initials of my CSV files name. Which is partially correct but not fully.

Apart from this once I have the files iterated, how to see the contents of the CSV files on the screen? Today I have 6 files. Later on, I could have 100 files. So, it's not possible to use the file handling method in my scenario.

Any suggestions?

Nazim Kerimbekov
  • 4,712
  • 8
  • 34
  • 58
skill_seeker
  • 43
  • 1
  • 1
  • 5

5 Answers5

4

The easiest way I found during developing my project is by using dataframe, read_csv, and glob.

import glob
import os
import pandas as pd

folder_name = 'train_dataset'
file_type = 'csv'
seperator =','
dataframe = pd.concat([pd.read_csv(f, sep=seperator) for f in glob.glob(folder_name + "/*."+file_type)],ignore_index=True)

Here, all the csv files are loaded into 1 big dataframe.

Project Folder structure

2

I would recommend reading your CSVs using the pandas library. Check this answer here: Import multiple csv files into pandas and concatenate into one DataFrame

Although you asked for python in general, pandas does a great job at data I/O and would help you here in my opinion.

louis_guitton
  • 5,105
  • 1
  • 31
  • 33
  • In the example from your link has "list_ = []", what does "list_". Please share some web link for further study on this part. The example in your web link works as desired. – skill_seeker Jul 13 '18 at 09:16
  • @skill_seeker `list_` is a temporary variable holding the list of each read CSV in its own dataframe. If you then want to concatenate them you do `pd.concat(list_)`, but if you're just interested in the individual dataframes, you can look at them individually doing `list_[0]` for example – louis_guitton Jul 13 '18 at 13:41
2

If you want to import your files as separate dataframes, you can try this:

import pandas as pd
import os

filenames = os.listdir("../data/") # lists all csv files in your directory

def extract_name_files(text): # removes .csv from the name of each file
    name_file = text.strip('.csv').lower()
    return name_file

names_of_files = list(map(extract_name_files,filenames)) # creates a list that will be used to name your dataframes

for i in range(0,len(names_of_files)): # saves each csv in a dataframe structure
    exec(names_of_files[i] + " =  pd.read_csv('../data/'+filenames[i])")

Trex
  • 529
  • 3
  • 11
1

till here I expect the output to be the names of the CSV files

This is the problem. csv.reader objects do not represent filenames. They represent lazy objects which may be iterated to yield rows from a CSV file. Or, if you wish to print the entire CSV file, you can call list on the csv.reader object:

for files in os.listdir("C:\\Users\\AmiteshSahay\\Desktop\\test_csv"):
    reader = csv.reader(files)
    print(list(reader))

if I add next() function after the csv.reader(), I get below output

Yes, this is what you should expect. Calling next on an iterator will give you the next value which comes out of that iterator. This would be the first line of each file. For example:

from io import StringIO
import csv

some_file = StringIO("""1
2
3""")

with some_file as fin:
    reader = csv.reader(fin)
    print(next(reader))

['1']

which happens to be sorted. for example, names are 1.csv, 2.csv so on.

This is either a coincidence or a correlation between the filename and the contents of the respective file. Calling next(reader) will not output part of a filename.

Apart from this once I have the files iterated, how to see the contents of the csv files on the screen?

Use the print command, as in the examples above.

Today I have 6 files. Later on, I could have 100 files. So, it's not possible to use the file handling method in my scenario.

This is not true. You can define a function to print all or part or your csv file. Then call that function in a for loop with filename as an input.

jpp
  • 159,742
  • 34
  • 281
  • 339
1

You can read and store several dataframes into separate variables using two lines of code.

import pandas as pd

datasets_list = ['users', 'calls', 'messages', 'internet', 'plans']

users, calls, messages, internet, plans = [(pd.read_csv(f'datasets/{dataset_name}.csv')) for dataset_name in datasets_list]
WanomiR
  • 11
  • 1