1

I would like to easily split my column into two (or more) using apply. (I could use split like they do here, but there are exceptions that would difficult to handle. This answer is similar, but again outputs one column.

import pandas as pd

df = pd.DataFrame({"xVal":[1,2,7,4], "xRef":["1-2","2-3",">4", "NoReference"]})

def sep_ref(row):
    if '-' in row:
        return row.split("-")
    else:
        # handle and return some list
        return [row, row]

# broken assinment
df['xlow'], df['xhigh'] = df.xRef.apply(sep_ref)

df

   xVal            xRef
0     1           '1-2'
1     2           '2-3'
2     7            '>4'
3     4   'NoReference'

desired output

   xVal  xlow  xhigh
0     1     1      2
1     2     2      3
2     7     4    NaN
3     4   NaN    NaN

The easy solution is to run two separate apply functions, but this is less elegant and could make exception handling more difficult. Is there a way to append 2 columns at once with apply?

rocket_brain
  • 69
  • 1
  • 8
  • Can you explain better the logic when `-` is not present? Why you have `-4` and `NaN` for the 3rd row, and `NaN` and `Nan` for the 4th? – rafaelc Aug 08 '19 at 03:55
  • The idea is that my function (sep_ref) here could be an implementation with error handling, (at the basic level use conditionals to determine what to return. if row[0] == '>': return [row[1], NaN]). In the question I want to show that a basic [split](https://stackoverflow.com/questions/14745022/how-to-split-a-column-into-two-columns) is not my desired solution – rocket_brain Aug 08 '19 at 04:06

1 Answers1

1

UPDATE:

I just noticed the NaN preferences. Here is a fix:

import pandas as pd
import numpy as np

df = pd.DataFrame({"xVal":[1,2,7,4], "xRef":["1-2","2-3",">4", "NoReference"]})

def sep_ref(row):
    if '-' in row:
        return [int(x) for x in row.split("-")]
    elif row.startswith('>'):
        return [int(row[1:]), np.nan]
    elif row.startswith('<'):
        return [np.nan, int(row[1:])]
    else:
        return [np.nan, np.nan]

# not broken assinment
df['xlow'] = None
df['xhigh'] = None
df[['xlow', 'xhigh']] = [*df.xRef.apply(sep_ref)]
print(df)
   xVal         xRef  xlow  xhigh
0     1          1-2   1.0    2.0
1     2          2-3   2.0    3.0
2     7           >4   4.0    NaN
3     4  NoReference   NaN    NaN

ORIGINAL:

To do this, I think you need to initialize "xlow" and "xhigh" columns first.

# not broken assignment
df['xlow'] = None
df['xhigh'] = None
df[['xlow', 'xhigh']] = [*df.xRef.apply(sep_ref)]
print(df)

Output:

   xVal         xRef         xlow        xhigh
0     1          1-2            1            2
1     2          2-3            2            3
2     7           >4           >4           >4
3     4  NoReference  NoReference  NoReference
brentertainer
  • 2,118
  • 1
  • 6
  • 15
  • What exactly is happening in "[*return_list]" that makes it compatible for the assignment when "return_list" doesn't work. Shouldn't the latter set the new columns values to the sep_ref return on the first row? (ie 'xlow':[1,1,1,1] and 'xhigh':[2,2,2,2]) – rocket_brain Aug 08 '19 at 04:44
  • 1
    In your case, the problem with `Series.apply` is that it always returns a Series that is analogous to a list of lists of lists (e.g. `[[[1, 2]], [[2, 3]], ...]`). It's basically a 3-D construct trapped inside a 1-D Series. To get around that, I *'d it to make it a true list of lists (e.g. `[[1, 2], [3, 4], ...])` -- a 2-D construct that can fill two columns. – brentertainer Aug 08 '19 at 04:59