0

I have multiple csv files with similar names in numeric order (nba_1, nba_2, etc). They are all formatted the same as far as column names and dtypes. Instead of manually pulling each one in individually to a dataframe (nba_1 = pd.read_csv('/nba_1.csv')) is there a way to write a for loop or something like it to pull them in and name them? I think the basic framework would be something like:

for i in range(1, 6):
    nba_i = pd.read_csv('../nba_i.csv')

Beyond that, I do not know the particulars. Once I pull them in I will be performing the same actions on each of them (deleting and formating the same columns) so I would also want to iterate through them there.

Thank you in advance for your help.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Jerel
  • 73
  • 1
  • 2
  • 6
  • 1
    Possible duplicate of [How do I create a variable number of variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) – G. Anderson Sep 11 '19 at 16:48
  • What are you trying to do with the files is the better question. – Trenton McKinney Sep 11 '19 at 16:48
  • Mixing strings and numbers to build a string is a job for "str.format()" – Michael Butscher Sep 11 '19 at 16:48
  • use 'glob'...! that should be it – Amit Gupta Sep 11 '19 at 16:56
  • What I am doing to each one after I am pulling them in is say which columns I want to keep in the dataframe ```nba_1=nba_1[['a', 'b', 'c', 'd']] then doing some work on a date column (when it comes in it is formatted as a string and fixing some other stuff). Once I have done that on each of them, I will be combining them into one dataframe. I think I am also set on doing that last step. I just want to simplify pulling them in and formatting them. There are only six files so it would not be a huge thing to write the code six times but if I can be cleaner and more efficient then why not. – Jerel Sep 11 '19 at 16:57

2 Answers2

1
  • I think your real question is how to get all the files into a dataframe
  • Use pathlib, part of the standard library, to work with your files.
  • Since your csv files are the same, as you stated in the question, it would be more efficient to combine them all into a single dataframe and then clean the data all at once.
    • It's less efficient to clean each dataframe separately, and then combine them

To get a single, combined dataframe

from pathlib import Path
import pandas as pd

p = Path(r'c:\some_path_to_files')  # set your path
files = p.glob('nba*.csv')  # find your files

# It was stated, all the files are the same format, so create one dataframe
df = pd.concat([pd.read_csv(file) for file in files]) 
  • [pd.read_csv(file) for file in files] is a list comprehension, which creates a dataframe of each file.
  • pd.concat combines all the files in the list

To get separate dataframes:

  • create a dict of dataframes
  • each key of the dict will be a filename
p = Path(r'c:\some_path_to_files')  # set your path
files = p.glob('nba*.csv')  # find your files

df_dict = dict()
for file in files:
    df_dict[file.stem] = pd.read_csv(file)

Using df_dict:

df_dict.keys()  # to show you all the keys

df_dict[filename]  # to access a specific dataframe

# after cleaning the individual dataframes in df_dict, they can be combined
df_final = pd.concat([value for value in df_dict.values()])
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • Eventually I want to get them into the same dataframe but I cannot do that to start with. They are each for different years. When I pull them in the date column is a string and is MMDD with no separator (ex: ```0526, 0617```). What I have done on the one dataframe I have been experimenting on to get replicable code is taking the string date and adding a ```/``` in the middle (```05/26```) and then appending on a year. (to be continued in the next comment) – Jerel Sep 11 '19 at 17:18
  • The data is for individual NBA seasons so not all of the dates have the same year. October to December are one year and January to June are another year. If I combine them to one file to start with I think it will treat all of my 0526's the same when they each represents a different year and each csv will two year numbers in it (the 2012 season cover part of 2012 and 2013) that I need to add in. That is why I need to clean each one of them individually before merging. I am open to hearing a way to merge them if you can get think get over the hurdle I think I have. – Jerel Sep 11 '19 at 17:20
  • That makes sense, just use the second option to create a dict of individual dataframes – Trenton McKinney Sep 11 '19 at 17:21
0

The Dask library built over Pandas has methods to load multiple csv to single dataframe, at once.

Lore
  • 1,286
  • 1
  • 22
  • 57