2

so I need to split a data Frame column and get the first item to put in a new column with a lambda fuction. I can't figure out how to do that.

df['Reason'] = df['title'].apply(lambda x: x.split(':'))

I'm getting this for now:

df['Reason'].head()

0     [EMS,  BACK PAINS/INJURY]
1    [EMS,  DIABETIC EMERGENCY]
2        [Fire,  GAS-ODOR/LEAK]
3     [EMS,  CARDIAC EMERGENCY]
4             [EMS,  DIZZINESS]

and I'd like:

df['Reason'].head()

0     [EMS]
1     [EMS]
2     [Fire]
3     [EMS]
4     [EMS]
Ruan_fer
  • 43
  • 1
  • 6

4 Answers4

2

I am using str.findall with regex here

df.text.str.findall(r"^\w+").str[0]
0     abc
1     foo
2    test
3     NaN
Name: text, dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234
1
df = pd.DataFrame({'text': ['abc xyz', 'foo bar', 'test', np.nan]})
df

      text
0  abc xyz
1  foo bar
2     test
3      NaN

Use any str method. For example, str.split:

df['text'].str.split(n=1).str[0]

0     abc
1     foo
2    test
3     NaN
Name: text, dtype: object

Or str.partition:

df['text'].str.partition(' ')[0]

0     abc
1     foo
2    test
3     NaN
Name: text, dtype: object

The methods above make working with NaNs easy. apply will fail here:

df['text'].apply(lambda x: x.split(':')[0])
# ---------------------------------------------------------------------------
# AttributeError                            Traceback (most recent call last)
# AttributeError: 'float' object has no attribute 'split'

An isinstance check is the fix for this,

df['text'].apply(lambda x: x.split(None, 1)[0] if isinstance(x, str) else np.nan)

0     abc
1     foo
2    test
3     NaN
Name: text, dtype: object
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Hi Man , would you like check this https://stackoverflow.com/questions/55616929/ffill-weird-behavior-when-have-the-duplicate-columns-names? I thought it is bug , just do not know why apply work here.. – BENY Apr 11 '19 at 00:35
1

If you have a column filled with lists, just do straightforwardly

df['Readon'].str[0]

or

df['Readon'].str.get(0)

Outputs

0     [EMS]
1     [EMS]
2     [Fire]
3     [EMS]
4     [EMS]
rafaelc
  • 57,686
  • 15
  • 58
  • 82
0

Take the first item of the list returned by split():

df['Reason'] = df['title'].apply(lambda x: x.split(':')[0])

For extra credit, tell split() to only split once so that it won't bother splitting more items only to throw them away.

df['Reason'] = df['title'].apply(lambda x: x.split(':', 1)[0])

Or use partition() instead:

df['Reason'] = df['title'].apply(lambda x: x.partition(':')[0])
kindall
  • 178,883
  • 35
  • 278
  • 309
  • `df['title'].str.split(n=1).str[0]` – cs95 Apr 11 '19 at 00:07
  • P.S.: [Avoid the use of `apply` as much as possible.](https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code) – cs95 Apr 11 '19 at 00:08