7

I can get this working

df['col_A'] = df.apply(lambda x: getSingleValue(x['col_X']), axis=1)

And also when my function returns tuple

df['col_A'] = df.apply(lambda x: getaTuple(x['col_X'])[0], axis=1)
df['col_B'] = df.apply(lambda x: getaTuple(x['col_X'])[1], axis=1)

But, I need to know if there is a way to apply the tuple output getaTuple() to multiple columns of the data frame using a single function call rather than calling getaTuple multiple times for each column I am setting the value.

Here is an example of input and output

df = pd.DataFrame(["testString_1", "testString_2", "testString_3"], columns=['column_X'])

def getaTuple(string):
    return tuple(string.split("_"))

In [3]: iwantthis
Out[3]: 
   col_X        col_A       col_B
0  testString_1 testString  1
1  testString_2 testString  2
2  testString_3 testString  3

FYI, This is similar to how to apply a function to multiple columns in a pandas dataframe at one time but not duplicate as in my case I need to pass col_X as input to my function.

Watt
  • 3,118
  • 14
  • 54
  • 85
  • 3
    Could you post a small reproducible sample data set and a desired data set? `df.apply(..., axis=1)` is awfully slow. So we could try to find a fast vectorized solution if we could see your input sample data set and desired data set... – MaxU - stand with Ukraine May 30 '17 at 19:01
  • @MaxU feel free to do without ```df.apply``` the only requirement is my function ```getaTuple()``` reads a column and returns two values which I need to set to two other columns in the same data frame. – Watt May 30 '17 at 19:07
  • 1
    I would do if i would have a sample data set to play with and a desired data set to check whether solution is correct. Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. – MaxU - stand with Ukraine May 30 '17 at 19:08
  • @MaxU thanks for the tutorial link, I added an example of input/output, let me know if any further questions for me. – Watt May 30 '17 at 19:32

2 Answers2

19

If I understand your question correctly, this should work:

df[['col_A','col_B']] = df['col_X'].apply(getaTuple).apply(pd.Series)
DYZ
  • 55,249
  • 10
  • 64
  • 93
5

Here is vectorized solution:

In [53]: df[['col_A','col_B']] = df.column_X.str.split('_', expand=True)

In [54]: df
Out[54]:
       column_X       col_A col_B
0  testString_1  testString     1
1  testString_2  testString     2
2  testString_3  testString     3

UPDATE:

In [62]: df[['col_A','col_B']] = df.column_X.str.split('_', expand=True)

In [63]: df
Out[63]:
       column_X       col_A col_B
0  testString_1  testString     1
1  testString_2  testString     2
2  testString_3  testString     3
3                            None
4       aaaaaaa     aaaaaaa  None

PS if your desired data set should look differently, please post it in your question

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Great, will try this, thanks for suggesting alternate to apply – Watt May 30 '17 at 19:35
  • 1
    @Watt, glad I could help. As soon as we can see your sample data and desired data set we can provide much more precise and __tested__ answers ;-) – MaxU - stand with Ukraine May 30 '17 at 19:37
  • thanks, quick question: if ```Col_X``` is empty or doesn't have ```_``` for some rows, will this still work? I was able to handle those edge-cases in my ```getaTuple()```, wondering if you can show how to handle in the vectorized form? – Watt May 30 '17 at 20:14
  • 1
    @Watt, we need to know how should desired data set (including edge-cases) look like – MaxU - stand with Ukraine May 30 '17 at 20:17