1

I have a large number of DataFrames with similar prefix df_, that look like:

df_1
df_x
df_ab
.
.
.
df_1a
df_2b

Of course I can do final_df = pd.concat([df_1, df_x, df_ab, ... df_1a, df_2b], axis = 1)

The issue is that although the prefix df_ will always be there, the rest of the dataframes' names keep changing and do not have any pattern. So, I have to constantly update the list of dataframes in pd.concat to create the 'final_df`, which is cumbersome.

Question: is there anyway to tell python to concatenate all defined dataframes in the namespace (only) starting with df_ and create the final_df or at least return a list of all such dataframes that I can then manually feed into pd.concat?

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
Saeed
  • 1,848
  • 1
  • 18
  • 26
  • Perhaps https://stackoverflow.com/questions/12101958/how-to-keep-track-of-class-instances can help. – Tai Feb 26 '18 at 00:22
  • 1
    Why are these similarly structured data frames not contained in a list or dict? Please back up and explain how they are created. – Parfait Feb 26 '18 at 00:23
  • Thanks for the suggestion. But won't that solution still require me to manually declare each dataframe as an instance of a class? – Saeed Feb 26 '18 at 00:23
  • If all of the dataframes of interest are in a dict or list, then you simply operate on said dict or list. You don't need to go searching for dataframes of interest... – Stephen Rauch Feb 26 '18 at 00:30

1 Answers1

1

You could do something like this, using the built-in function globals():

def concat_all(prefix='df_'):
    dfs = [df for name, df in globals().items() if name.startswith(prefix)
           and isinstance(df, pd.DataFrame)]
    return pd.concat(dfs, axis=1)

Logic:

  1. Filter down your global namespace to DataFrames that start with prefix
  2. Put these in a list (concat doesn't take a generator)
  3. Call concat() on the first axis.

Example:

import pandas as pd

df_1 = pd.DataFrame([[0, 1], [2, 3]])
df_2 = pd.DataFrame([[4, 5], [6, 7]])
other_df = df_1.copy() * 2  # ignore this
s_1 = pd.Series([1, 2, 3, 4])  # and this

final_df = concat_all()
final_df

   0  1  0  1
0  0  1  4  5
1  2  3  6  7

Always use globals() with caution. It gets you a dictionary of the entire module namespace.

You need globals() rather than locals() because the dictionary is being used inside a function. locals() would be null here at time of use.

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235