1

I wish to merge many different data frames with names that match a regular expression pattern. (NOT the column names, I mean the name of the dataframe itself).

With credit to the accepted response on this page, I am able to get my desired output with:

reduce(lambda x, y: pd.merge(x, y, on = 'variable'), [df1, df2, df3])

But typing them all out is quite tedious. My desired data frames are all labelled with a prefix of "m_", so I was hoping there would be a simple way of using regex to match all my data frames using "^m_".

In hopes of providing more context, I have already made a post regarding this issue in RStudio. In fact, I already knew how to do this in R, but that question was regarding how to shove all the code into my own function (which I would also love to know how to do in this situation). So if it helps by any means, this is the exact R equivalent of what I'm trying to do:

Reduce(function(...) merge(..., all = TRUE), mget(apropos("^m_")))

And if possible, make my own function out of it like this (but still doing this with python instead):

multi.merge <- function(pattern){
    Reduce(function(...) merge(..., all = TRUE), mget(apropos(pattern), envir=.GlobalEnv))
}
output <- multi.merge("^m_")

But if you don't know what any of that means in R, hopefully my desired output is still clear.

L77
  • 137
  • 7

1 Answers1

2

This should do it:

def global_pd_dfs(pattern=None, return_values=False):
    dct = globals()
    if pattern is None:
        return [dct[x] if return_values else x for x in dct.keys() if type(dct[x]) == pd.core.frame.DataFrame]
    else:
        pattern = re.compile(pattern)
        return [dct[x] if return_values else x for x in dct.keys() if type(dct[x]) == pd.core.frame.DataFrame and pattern.match(x)]

def multi_merge(pattern=None):
    return reduce(lambda x, y: pd.merge(x, y, on = 'variable'), global_pd_dfs(pattern=pattern, return_values = True))

If no pattern given, global_pd_dfs() lists all pandas dfs in global environment. And multi_merge() will try to merge all pandas dfs in global environment.

R's ls() can be mimicked by Pythons globals(), however, the last one is a dictionary.

Gwang-Jin Kim
  • 9,303
  • 17
  • 30