1

I have about 500 '.csv' files starting with letter 'T' e.g. 'T50, T51, T52 ..... T550' and there are some other ',csv' files with other random names in the folder. I want to read all csv files starting with "T" and store them in separate dataframes: 't50, t51, t52... etc.'

The code I have written just reads these files into a dataframe

import glob
import pandas as pd

for file in glob.glob("T*.csv"):
    print (file)

I want to have a different name for each dataframe - preferably, their own file names. How can I achieve this within its 'for loop'?

cyrus24
  • 353
  • 3
  • 9
  • 1
    So what is your question? – creyD Jul 11 '19 at 13:38
  • Possible duplicate of [Changing variable names with Python for loops](https://stackoverflow.com/questions/1060090/changing-variable-names-with-python-for-loops) – Zaraki Kenpachi Jul 11 '19 at 13:40
  • you can create a list of files starting with 'T' like this ```filelist = [file for file in os.listdir(folder) if file.startswith('T')]``` Then load them to pandas by looping over the filelist. – ABot Jul 11 '19 at 13:46
  • Automatically generating variable names is shown [here](https://stackoverflow.com/a/4010856/8720308)! Just adapt the ```xrange()``` to ```range()```, depending on your version. – ABot Jul 11 '19 at 13:51

2 Answers2

3

Totally agree with @Comos
But if you still need individual variable names, I adapted the solution from here!

import pandas as pd
import os

folder = '/path/to/my/inputfolder'

filelist = [file for file in os.listdir(folder) if file.startswith('T')]
for file in filelist:
    exec("%s = pd.read_csv('%s')" % (file.split('.')[0], os.path.join(folder,file)))
ABot
  • 197
  • 12
  • Thanks ABotros, what if I want to assign the name of the file to the dataframe? Some files also have only alphabets , while most of them have numbers. How can I extract the file name and assign the same to the dataframe it is stored in? – cyrus24 Jul 11 '19 at 14:11
  • So you want the file-name as the variable-name of the dataframe? (Thats what the code above should do) Or you want the file-name as a column-name within the dataframe? Could you specify what you mean? Otherwise, use the solution of @Comos – ABot Jul 11 '19 at 14:22
  • Oh yes, want the file-name as the variable-name of the dataframe. However I encountered "NameError", after looking up a bit what I see is, it could be due to "exec" statement. Any idea? – cyrus24 Jul 11 '19 at 14:42
  • Sorry, there was some bad code in it. I fixed it now and have tried it with my own small example. I forgot to treat the string as an actual string (the ' ' around the %s in the pd.read_csv call). Also I have changed the loop for better readability and adapted the naming system to your requirements. – ABot Jul 11 '19 at 14:47
  • 1
    This one says, file doesn't exist, when I printed 'filelist', it shows names of all files identified which start with T. Does '%' assign values to respective expressions in 'exec' statement? `exec("%s = pd.read_csv('%s')" % (file.split('.')[0], file))` – cyrus24 Jul 11 '19 at 16:31
  • The ```%s``` is a placeholder for a string. Have a look [here](https://stackoverflow.com/questions/4288973/whats-the-difference-between-s-and-d-in-python-string-formatting) and [here](https://www.tutorialspoint.com/python/python_strings) for more explanations on how to use them. In short, the first ```%s``` is filled in with the string contained in ```file.split('.')[0]```, the second ```%s``` is filled in with the string contained in ```file```. Have you set the folder name? In my working example I just have the files in the same folder as my program, therefore I added ```folder='.'```. – ABot Jul 12 '19 at 07:22
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/196359/discussion-between-abotros-and-shweta24). – ABot Jul 12 '19 at 07:24
  • Thank you so much ABotros, this has now worked, the path to the folder while we execute the string was the part where it was failing. It works perfect now! – cyrus24 Jul 12 '19 at 17:38
2

In additions to ABotros's answer, to read all files in different dataframes, I would recommend adding the files to a dictionary, which will allow you to save dataframes with different names in a loop:

filelist = [file for file in os.listdir(folder) if file.startswith('T')]

database = {}
for file in filelist:
    database[file] = pd.read_csv(file)
Comos
  • 82
  • 10