Concatenating multiple pandas DataFrames

Question

I have a large number of DataFrames with similar prefix df_, that look like:

df_1
df_x
df_ab
.
.
.
df_1a
df_2b

Of course I can do final_df = pd.concat([df_1, df_x, df_ab, ... df_1a, df_2b], axis = 1)

The issue is that although the prefix df_ will always be there, the rest of the dataframes' names keep changing and do not have any pattern. So, I have to constantly update the list of dataframes in pd.concat to create the 'final_df`, which is cumbersome.

Question: is there anyway to tell python to concatenate all defined dataframes in the namespace (only) starting with df_ and create the final_df or at least return a list of all such dataframes that I can then manually feed into pd.concat?

Perhaps https://stackoverflow.com/questions/12101958/how-to-keep-track-of-class-instances can help. — Tai, Feb 26 '18 at 00:22
Why are these similarly structured data frames not contained in a list or dict? Please back up and explain how they are created. — Parfait, Feb 26 '18 at 00:23
Thanks for the suggestion. But won't that solution still require me to manually declare each dataframe as an instance of a class? — Saeed, Feb 26 '18 at 00:23
If all of the dataframes of interest are in a dict or list, then you simply operate on said dict or list. You don't need to go searching for dataframes of interest... — Stephen Rauch, Feb 26 '18 at 00:30

Brad Solomon · Accepted Answer · 2018-02-26T00:37:23.323

You could do something like this, using the built-in function globals():

def concat_all(prefix='df_'):
    dfs = [df for name, df in globals().items() if name.startswith(prefix)
           and isinstance(df, pd.DataFrame)]
    return pd.concat(dfs, axis=1)

Logic:

Filter down your global namespace to DataFrames that start with prefix
Put these in a list (concat doesn't take a generator)
Call concat() on the first axis.

Example:

import pandas as pd

df_1 = pd.DataFrame([[0, 1], [2, 3]])
df_2 = pd.DataFrame([[4, 5], [6, 7]])
other_df = df_1.copy() * 2  # ignore this
s_1 = pd.Series([1, 2, 3, 4])  # and this

final_df = concat_all()
final_df

   0  1  0  1
0  0  1  4  5
1  2  3  6  7

Always use globals() with caution. It gets you a dictionary of the entire module namespace.

You need globals() rather than locals() because the dictionary is being used inside a function. locals() would be null here at time of use.

Concatenating multiple pandas DataFrames

1 Answers1