2

I have a function which returns a list of length 2. I would like to apply this function to one column in my dataframe and assign the result to two columns.

This actually works:

from pandas import *

def twonumbers(x):
    return [2*x, 3*x]

df = DataFrame([1,4,11],columns=['v1'])

concat([df,DataFrame(df['v1'].map(twonumbers).tolist(), columns=['v2','v3'])],axis=1)

But I am looking for a simpler way to do the last line above. Something like this:

df['v3'], df['v2'] = df['v1'].map(twonumbers)
Pekka
  • 2,348
  • 2
  • 21
  • 33
  • 1
    Does this: http://stackoverflow.com/questions/12356501/pandas-create-two-new-columns-in-a-dataframe-with-values-calculated-from-a-pre?rq=1 or this help http://stackoverflow.com/questions/15118111/apply-function-to-each-row-of-pandas-dataframe-to-create-two-new-columns?rq=1? – EdChum Aug 01 '14 at 12:36
  • Ohhhhh... Believe it or not I really tried to search for some time but didn't see the first link. That answers my question exactly. Thank you a lot. – Pekka Aug 01 '14 at 12:42
  • no worries for some reason the search is not so good but it magically finds related questions after you post a question – EdChum Aug 01 '14 at 12:43

1 Answers1

3
import pandas as pd

def twonumbers(x):
    return [2*x, 3*x]

df = pd.DataFrame([1,4,11], columns=['v1'])
df['v2'], df['v3'] = twonumbers(df['v1'])

makes df look like this

   v1  v2  v3
0   1   2   3
1   4   8  12
2  11  22  33

Note: This relies on twonumbers being able to accept a Pandas Series as input, and returning a list of two Series as output.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Yes this works here. Seems that my original function doesn't accept dataframe as an input, but the problem can be solved with the related question EdChum pointed out. – Pekka Aug 01 '14 at 12:46
  • That's too bad, since performance is much better when you can apply functions to whole Series rather than to individual values one-at-a-time. – unutbu Aug 01 '14 at 12:49
  • My function is like this `def getnumbers(x): return re.findall('\d+',x)` and the Dataframe values are times like `"7:20"`. – Pekka Aug 01 '14 at 12:58
  • 1
    Then instead of `getnumbers` you could use `df[['v2','v3']] = df['v1'].str.extract(r'(\d+):(\d+)')`. Pandas Series have str methods builtin: http://pandas.pydata.org/pandas-docs/stable/basics.html#vectorized-string-methods. – unutbu Aug 01 '14 at 13:21
  • Thank you! They say I should avoid null comments like this but your help is very useful and much appreciated. – Pekka Aug 01 '14 at 14:01