0

I'm trying to read a bunch of csvs into pandas using a for loop. I want the table names to be the last bit of the full file path before the extension. For example,

ACS_BV0002_2016_Age.csv 

would be

Age 

I am doing this so I can create dictionaries with the table name as a key and the column names and data types as values, which I can then use in psycogpg2 to create all of my tables in postgresql in one fell swoop.

This seems to get the name i want:

path = r"C:\Data\Waste_Intervention\Census_Tables\Cleaned"
fList = os.listdir(path)

for doc in fList:
    csv = "{}\\{}".format(path, doc)
    name = doc.split("_")[-1][:-4]

    pd.read_csv(csv)

Is there a way I can use the output of name become the variable name for the dataframe read in by pd.read_csv?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
geoJshaun
  • 637
  • 2
  • 11
  • 32
  • Can you provide sample output? See https://docs.python.org/3.6/library/os.path.html for some path name manipulation examples. Are you trying to put multiple CSVs into one dataframe, or one dataframe per file? – Evan Jan 12 '18 at 01:05
  • `os.path.splitext(filename)[0].split('_')[-1]` will get you the last string of text after the final '_' but before the file extension. – Evan Jan 12 '18 at 01:07
  • It is totally unclear what you're trying to do here. Are you trying to move dataframes? – cs95 Jan 12 '18 at 01:09
  • @cᴏʟᴅsᴘᴇᴇᴅ How is it not clear that I am trying to bring in csvs into pandas? – geoJshaun Jan 12 '18 at 01:12
  • @cᴏʟᴅsᴘᴇᴇᴅ providing more elaboration – geoJshaun Jan 12 '18 at 01:16
  • 2
    The edit makes it clearer, but your initial post did not properly explain why you were trying to assign the output of read_csv to that result if you wanted to _assign the result_ to the dataframe (completely different). – cs95 Jan 12 '18 at 01:16
  • @cᴏʟᴅsᴘᴇᴇᴅ got it. – geoJshaun Jan 12 '18 at 01:18

1 Answers1

1

From your code, it is not clear why you assign the result of read_csv to the item that you are trying to assign to the dataframe. Anyway, you asked:

Is there a way I can pass the bit of string I want in to the table name for pd.csv_read so that I can get all csvs in the path into pandas with a for loop and have them retain a simple understandable name?

In this situation, there are a limited number of things you can do. DataFrame objects aren't really associated with a "name", per say, you use a descriptive variable name to handle that.

However, for your case, where you wish to create a variable number of variables, the easiest thing to do (that I'd do), is to use a dictionary.

dfs = {}

for doc in fList:
    i = "{}\\{}".format(path, x[0])
    j = doc.split("_")[-1][:-4]

    dfs[j] = pd.read_csv(i)

You can now refer to the dataframe loaded from ACS_16_5YR_B02001_race.csv using dfs['race']!

cs95
  • 379,657
  • 97
  • 704
  • 746