How to efficiently apply tuple to multiple columns in a pandas dataframe simultaneously

Question

I can get this working

df['col_A'] = df.apply(lambda x: getSingleValue(x['col_X']), axis=1)

And also when my function returns tuple

df['col_A'] = df.apply(lambda x: getaTuple(x['col_X'])[0], axis=1)
df['col_B'] = df.apply(lambda x: getaTuple(x['col_X'])[1], axis=1)

But, I need to know if there is a way to apply the tuple output getaTuple() to multiple columns of the data frame using a single function call rather than calling getaTuple multiple times for each column I am setting the value.

Here is an example of input and output

df = pd.DataFrame(["testString_1", "testString_2", "testString_3"], columns=['column_X'])

def getaTuple(string):
    return tuple(string.split("_"))

In [3]: iwantthis
Out[3]: 
   col_X        col_A       col_B
0  testString_1 testString  1
1  testString_2 testString  2
2  testString_3 testString  3

FYI, This is similar to how to apply a function to multiple columns in a pandas dataframe at one time but not duplicate as in my case I need to pass col_X as input to my function.

Could you post a small reproducible sample data set and a desired data set? `df.apply(..., axis=1)` is awfully slow. So we could try to find a fast vectorized solution if we could see your input sample data set and desired data set... — MaxU - stand with Ukraine, May 30 '17 at 19:01
@MaxU feel free to do without ```df.apply``` the only requirement is my function ```getaTuple()``` reads a column and returns two values which I need to set to two other columns in the same data frame. — Watt, May 30 '17 at 19:07
I would do if i would have a sample data set to play with and a desired data set to check whether solution is correct. Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. — MaxU - stand with Ukraine, May 30 '17 at 19:08
@MaxU thanks for the tutorial link, I added an example of input/output, let me know if any further questions for me. — Watt, May 30 '17 at 19:32

score 19 · Answer 1 · answered May 30 '17 at 19:18

19

If I understand your question correctly, this should work:

df[['col_A','col_B']] = df['col_X'].apply(getaTuple).apply(pd.Series)

answered May 30 '17 at 19:18

DYZ

55,249
10
64
93

1

Very elegant and has the advantage of need not touch the given function `getTuple()`. – Bill Huang May 09 '20 at 19:54

MaxU - stand with Ukraine · Accepted Answer · 2017-05-30T20:18:28.553

5

Here is vectorized solution:

In [53]: df[['col_A','col_B']] = df.column_X.str.split('_', expand=True)

In [54]: df
Out[54]:
       column_X       col_A col_B
0  testString_1  testString     1
1  testString_2  testString     2
2  testString_3  testString     3

UPDATE:

In [62]: df[['col_A','col_B']] = df.column_X.str.split('_', expand=True)

In [63]: df
Out[63]:
       column_X       col_A col_B
0  testString_1  testString     1
1  testString_2  testString     2
2  testString_3  testString     3
3                            None
4       aaaaaaa     aaaaaaa  None

PS if your desired data set should look differently, please post it in your question

edited May 30 '17 at 20:18

answered May 30 '17 at 19:33

MaxU - stand with Ukraine

205,989
36
386
419

Great, will try this, thanks for suggesting alternate to apply – Watt May 30 '17 at 19:35
1

@Watt, glad I could help. As soon as we can see your sample data and desired data set we can provide much more precise and __tested__ answers ;-) – MaxU - stand with Ukraine May 30 '17 at 19:37
thanks, quick question: if ```Col_X``` is empty or doesn't have ```_``` for some rows, will this still work? I was able to handle those edge-cases in my ```getaTuple()```, wondering if you can show how to handle in the vectorized form? – Watt May 30 '17 at 20:14
1

@Watt, we need to know how should desired data set (including edge-cases) look like – MaxU - stand with Ukraine May 30 '17 at 20:17

How to efficiently apply tuple to multiple columns in a pandas dataframe simultaneously

2 Answers2