0

In Python, I've created a bunch of dataframes like so:

df1 = pd.read_csv("1.csv")
...
df50 = pd.read_csv("50.csv") # import modes may vary based on the csv, no real way to shorten this

For every dataframe, I'd like to perform an operation which requires assigning a string as a name. For instance, given an existing database db,

df1.to_sql("df1", db) # and so on. 

The dataframes may have a non-sequential name, so I can't do for i in range(1,51): "df"+str(i).

I'm looking for the right way to do this, instead of repeating the line 50 times. My idea was something like

for df in [df1, df2... df50]: 
    df.to_sql(df.__name__, db) # but dataframes don't have a __name__
  1. How do I get the string "df1" from the dataframe I've called df1?
  2. Is there an even nicer way to do all this?
Zubo
  • 1,543
  • 2
  • 20
  • 26
  • 2
    You could put your dataframes in a container (list / dict) in the first place and loop over it afterwards. – Jan May 01 '21 at 22:20
  • 2
    Does this answer your question? [Get the name of a pandas DataFrame](https://stackoverflow.com/questions/31727333/get-the-name-of-a-pandas-dataframe) – Tom May 01 '21 at 22:27
  • @Tom Sort of. It does suggest adding a little bit of code when importing them, but I was rather looking for something like this thread that I've just found https://stackoverflow.com/questions/18425225/getting-the-name-of-a-variable-as-a-string/18425523 It does suggest, though, that what I want is quite cumbersome – Zubo May 01 '21 at 22:33
  • @Zubo as I mentioned in my answer this isn't really something you should be doing – Tom May 01 '21 at 22:36
  • Objects have no idea what variables haooe. To refer to them and what their names are. That's because *you shouldn't be writing code that requires it*. If you need to associate a string with another object, there are various ways to do that. Variable names are for humans reading source code, they shouldn't contain data – juanpa.arrivillaga May 01 '21 at 22:56

1 Answers1

0

Since the name appears to have been created following a pattern in the first place, just use code to replicate that pattern:

for i, df in enumerate([df1, df2... df50]):
    df.to_sql(f'df{i}', db)

(Better yet, don't have those variables in the first place; create the list directly.)

The dataframes may have a non-sequential name, so I can't do for i in range(1,51): "df"+str(i).

Oh. Well in that case, if you want to associate textual names with the objects, that don't follow a pattern, that is what a dict is for:

dfs = {
    "df1": pd.read_csv("1.csv"),
    # whichever other names and values make sense
}

which you can iterate over easily:

for name, df in dfs.items():
    df.to_sql(name, db)

If there is a logical rule that relates the input filename to the one that should be used for the to_sql call, you can use a dict comprehension to build the dict:

dfs = {to_sql_name(csv_name): pd.read_csv(csv_name) for csv_name in ...}

Or do the loading and processing in the same loop:

for csv_name in ...:
    pd.read_csv(csv_name).to_sql(to_sql_name(csv_name), db)
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153