0

I would need to create a dataframe for each of the following datasets (csv files stored a folder):

0 text1.csv
1 text2.csv
2 text3.csv
3 text4.csv
4 text5.csv

The above list is created using os.chdir and lists all the csv files included in a folder in the following path:

os.chdir("path")

To create the dataframe (to be used later on) for each of the datasets above, I am doing as follow:

texts=[]

for item in glob.glob("*.csv"):
    texts.append(item)

for (x,z) in enumerate(texts):
    print(x,z)
    df = pd.read_csv(datasets[int(x)])
    df.index.name = datasets[int(x)]

However, it does not create any dataframe. I think the problem is in df, as I am not distinguishing it for each dataset (but I am only trying to read each dataset using pd.read.csv(datasets[int(x)])).

Could you please tell me how to create a dataframe per each of the datasets (for example df1 related to text1, df2 related to text2, and so on)?

Thank you for your help.

  • If you are happy with a ```dict``` of dataframes, which you will have to use keys to access each one later, then @holdenweb's answer to this SO question is a good one: https://stackoverflow.com/questions/30635145/create-multiple-dataframes-in-loop – pink spikyhairman May 06 '20 at 18:55

2 Answers2

2

I'd use a function and return a list of the dataframes

Simple, one-liner function:

import glob
import pandas as pd


def get_all_csv(path, sep=','):
    # read all the csv files in a directory to a list of dataframes
    return [pd.read_csv(csv_file, sep=sep)
            for csv_file in glob.glob(path + "*.csv")]

# get all the csv in the current directory
dfs = get_all_csv('./', sep=';')
print(dfs)
bherbruck
  • 2,167
  • 1
  • 6
  • 17
  • your file must have strange format, I updated it to use the `;` tokenization. The original script should work with a normal csv though – bherbruck May 06 '20 at 18:46
  • unfortunately it is still not working. It is still returning me the following error: `ParserError: Expected 1 fields in line 51, saw 7`. So I think both cases should be included (original + the updated answer) –  May 06 '20 at 18:49
  • I updated it to let you choose the sep as a function argument. I would check your csv though and make sure all the rows have the same amount of columns – bherbruck May 06 '20 at 18:51
  • It seems like your csv has 1 column but one row has 7 commas (probably because of commas `,` in the row, wrap the row in `"` to get around this) – bherbruck May 06 '20 at 18:54
  • 1
    Yes, I think so. Unfortunately it is still not working because of that error. I think the other way to fix it is to add a try/except condition in order to get around it. But I have not been able to add it in your code. Thank you so much for your time and help –  May 06 '20 at 19:15
  • 1
    my pleasure! You would have to use a traditional `for` loop instead of a comprehension to use try/except otherwise it wouldn't return anything – bherbruck May 06 '20 at 19:21
0

Is a list of dataframes what you are looking for?

import pandas as pd
import glob

results=[]

paths =  glob.glob("*.csv"):

for path in paths:
    df = pd.read_csv(path)
    results.append(df)
Fabian Hertwig
  • 1,093
  • 13
  • 27