How can I use a string to set a variable name for my DataFrame?

Question

I'm trying to read a bunch of csvs into pandas using a for loop. I want the table names to be the last bit of the full file path before the extension. For example,

ACS_BV0002_2016_Age.csv

would be

Age

I am doing this so I can create dictionaries with the table name as a key and the column names and data types as values, which I can then use in psycogpg2 to create all of my tables in postgresql in one fell swoop.

This seems to get the name i want:

path = r"C:\Data\Waste_Intervention\Census_Tables\Cleaned"
fList = os.listdir(path)

for doc in fList:
    csv = "{}\\{}".format(path, doc)
    name = doc.split("_")[-1][:-4]

    pd.read_csv(csv)

Is there a way I can use the output of name become the variable name for the dataframe read in by pd.read_csv?

Can you provide sample output? See https://docs.python.org/3.6/library/os.path.html for some path name manipulation examples. Are you trying to put multiple CSVs into one dataframe, or one dataframe per file? — Evan, Jan 12 '18 at 01:05
`os.path.splitext(filename)[0].split('_')[-1]` will get you the last string of text after the final '_' but before the file extension. — Evan, Jan 12 '18 at 01:07
It is totally unclear what you're trying to do here. Are you trying to move dataframes? — cs95, Jan 12 '18 at 01:09
@cᴏʟᴅsᴘᴇᴇᴅ How is it not clear that I am trying to bring in csvs into pandas? — geoJshaun, Jan 12 '18 at 01:12
The edit makes it clearer, but your initial post did not properly explain why you were trying to assign the output of read_csv to that result if you wanted to _assign the result_ to the dataframe (completely different). — cs95, Jan 12 '18 at 01:16

cs95 · Accepted Answer · 2018-01-12T01:52:24.757

From your code, it is not clear why you assign the result of read_csv to the item that you are trying to assign to the dataframe. Anyway, you asked:

Is there a way I can pass the bit of string I want in to the table name for pd.csv_read so that I can get all csvs in the path into pandas with a for loop and have them retain a simple understandable name?

In this situation, there are a limited number of things you can do. DataFrame objects aren't really associated with a "name", per say, you use a descriptive variable name to handle that.

However, for your case, where you wish to create a variable number of variables, the easiest thing to do (that I'd do), is to use a dictionary.

dfs = {}

for doc in fList:
    i = "{}\\{}".format(path, x[0])
    j = doc.split("_")[-1][:-4]

    dfs[j] = pd.read_csv(i)

You can now refer to the dataframe loaded from ACS_16_5YR_B02001_race.csv using dfs['race']!

This worked really well except I had to switch i and j for read_csv. Thanks! — geoJshaun, Jan 12 '18 at 01:52

How can I use a string to set a variable name for my DataFrame?

1 Answers1