1

I would like to concat multiple dataframes into a single dataframe using the names of the dataframes as strings from a list. This is similar to:

df1 = pd.DataFrame({'x': [1, 2, 3], 'y': ['a', 'b', 'c']})    
df2 = pd.DataFrame({'x': [4, 5, 6], 'y': ['d', 'e', 'f']})

pd.concat([df1, df2])

but instead I want to provide a list of dataframe names as strings

For example,

pd.concat(['df1', 'df2'])

Is this possible?

Vedda
  • 7,066
  • 6
  • 42
  • 77
  • 2
    Variables are stored in `globals` namespace. So you can get them using `globals()[name]`. `pd.concat([globals()[x] for x in ['df1', 'df2']])` but this is not idiomatic and you should store your dataframes in a local dictionary and reference from that. – Psidom Sep 30 '21 at 22:15
  • @Psidom This is exactly what I was looking for thanks! Write up an answer and I'll accept. I couldn't find this on SE. – Vedda Sep 30 '21 at 22:15

3 Answers3

4

Although using globals and exec answers the question but it is considered bad practise. A better way to do this would be to use a dict likewise:

df_dict = {'df1': df1 , 'df2': df2}

pd.concat(df for _, df in df_dict.items())
Shivam Roy
  • 1,961
  • 3
  • 10
  • 23
  • Thanks for your answer, but this is not what I am asking. I want to be able to insert a list in concat as `pd.concat(['df1', 'df2'])` – Vedda Oct 01 '21 at 17:34
1

Python variable names generally have to be known at compile time, so selecting values from a list of names is tricky. As mentioned in the comments, you could use globals() to get the values from variables in global scope, but a more common practice is to use a dictionary from the beginning instead.

import pandas as pd

dataframes = { 
    "df1":pd.DataFrame({'x': [1, 2, 3], 'y': ['a', 'b', 'c']}),    
    "df2":pd.DataFrame({'x': [4, 5, 6], 'y': ['d', 'e', 'f']}) }    
to_concat = ["df1", "df2"]
result = pd.concat(dataframes[name] for name in to_concat)

Now the dataframes are all tucked neatly into their own namespace instead of being mixed with other stuff in globals. This is especially useful when the dataframes are read dynamically and you'd have to figure out how to get the names into the global space in the first place.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Thanks for your answer. This is close, but I want to be able to insert a list in concat as `pd.concat(['df1', 'df2'])`. Not from a dataframe. – Vedda Oct 01 '21 at 17:35
  • You can't `pd.concat` strings. Its puzzling that you would even include dataframes in the question if you don't want to concatenate them. You said _"I want to provide a list of dataframe names as strings"_. Dataframes don't inherently have a name - just variables or containers that happen to be holding them. The dict maps a name to a dataframe. Then I have a list of names and do the concat. – tdelaney Oct 01 '21 at 18:02
  • `globals()` __is__ a `dict`, so `globals()["df1"]` is much the same as `dataframes["df1"]`. The reasons for using a dict include (1) they are dynamically created and (2) they don't have other unrelated objects in them. Suppose you want to validate a name before using it for concat, `"pd" in dataframes` would say False, while `"pd" in globals()` would say True. – tdelaney Oct 01 '21 at 18:06
  • I realize it is unusual, but that's why I asked if it was possible. It's for a small use case and not a larger project so I don't mind the global issue. I greatly appreciate the detail you provided in these comments. Thanks! – Vedda Oct 02 '21 at 00:38
0

Do you want to use strings as variable names ? if so, you can do :

str_list = ["df1", "df2"]
pd.concat([locals()[str_list[0]], locals()[str_list[1]]])