1

Is there a way to use pandas.apply with a variable number of multiple columnar arguments? For example, say I have this data frame:

df = pd.DataFrame({'A':['a','b','c'],
                    'B':['a','b','c'],
                    'C':['a','b','c'],
                    'D':['a','b','c']})

I want to write a function that concatenates columns to produce a new column - very similar to this SO question. So a two column example would be:

def dynamic_concat_2(df, one, two):
    return df[one]+df[two]

I use the function like so

df['concat'] = df.apply(dynamic_concat2, axis=1, one='A',two='B')

Now the difficulty that I cannot figure out is how to do this for an unknown dynamic amount of columns. Is there a way to generalize the function usings **kwargs? So it could be 1-n columns to concatenate?

Additional context: This is a simple example of a larger problem to dynamically calculate row level data. A unknown number of columns have data that specifies a query to a database - this gets fed into a query and returns a value. I've written some truly inflexible horribly un-pythonic solutions (think for loops going through each row of data) that haven't worked. I'm hoping use of a df.apply can python-ify things.

Community
  • 1
  • 1
AZhao
  • 13,617
  • 7
  • 31
  • 54

1 Answers1

1

If I understand your question, it seems to me that the easiest solution would be to pick the columns from your dataframe first, then apply a function that concatenates all columns. This is just as dynamic, but a lot cleaner, in my opinion.

For example, using your data above:

cols = ['A', 'B', 'C']
df['concat'] = df[cols].apply(''.join, axis=1)

Such that

>>> df

   A  B  C  D concat
0  a  a  a  a    aaa
1  b  b  b  b    bbb
2  c  c  c  c    ccc
jme
  • 19,895
  • 6
  • 41
  • 39
  • this works, but really looking for a **kwargs solution here as this string concatenation is a simplification of my actual problem. see my contextual note – AZhao Nov 18 '15 at 22:01
  • @AZhao I'm not sure I understand what advantage `**kwargs` has over simply indexing into the pandas dataframe. If you could amend your example or provide a new one that shows why the above approach won't work (and why `**kwargs` are needed), I'd be happy to help. – jme Nov 18 '15 at 22:24