How to extract substring from varible length column in pandas dataframe?

Question

Hi there I am trying to accomplish something similar to the mid function in excel with a column in a pandas dataframe in python. I have a column with medication names + strengths, etc of variable length. I just want to pull out the first "part" of the name and place the result into another column in the dataframe.

Example:

Dataframe column

MEDICATION_NAME
acetaminophen 325 mg
a-hydrocort 100 mg/2 ml

Desired Result

MEDICATION_NAME               GENERIC_NAME
acetaminophen 325 mg          acetaminophen     
a-hydrocort 100 mg/2 ml       a-hydrocort

What I have tried

df['GENERIC_NAME'] = df['MEDICATION_NAME'].str[:df['MEDICATION_NAME'].apply(lambda x: x.find(' '))]

Basically I want to apply the row specific result of

df['GENERIC_NAME'] = df['MEDICATION_NAME'].apply(lambda x: x.find(' '))

to the

 str[:]

function?

Thanks

Can you provide more examples? Is the name always followed by a space and numbers then mg? Are there some Generic names with spaces? — ALollz, Nov 09 '18 at 20:55

score 3 · Answer 1 · edited Nov 10 '18 at 08:36

You can use str.partition [pandas-doc] here:

df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.partition(' ')[0]

For the given column this gives:

>>> g.str.partition(' ')[0]
0    acetaminophen
1      a-hydrocort
Name: 0, dtype: object

partition itself creates from a series a dataframe with three columns: before, match, and after :

>>> df['MEDICATION_NAME'].str.partition(' ')
               0  1            2
0  acetaminophen          325 mg
1    a-hydrocort     100 mg/2 ml

score 2 · Accepted Answer · answered Nov 09 '18 at 20:54

2

DO with str.split

df['MEDICATION_NAME'].str.split(n=1).str[0]
Out[345]: 
0    acetaminophen
1      a-hydrocort
Name: MEDICATION_NAME, dtype: object
#df['GENERIC_NAME']=df['MEDICATION_NAME'].str.split(n=1).str[0]

answered Nov 09 '18 at 20:54

BENY

317,841
20
164
234

score 1 · Answer 3 · answered Nov 09 '18 at 20:54

1

Use str.extract to use full regex features:

df["GENERIC_NAME"] = df["MEDICATION_NAME"].str.extract(r'([^\s]+)')

This capture the first word bounded by space. So will protect against instances where there are a space first.

answered Nov 09 '18 at 20:54

Rocky Li

5,641
2
17
33

score 1 · Answer 4 · answered Nov 09 '18 at 20:54

1

Try this:

df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.split(" ")[0]

answered Nov 09 '18 at 20:54

petezurich

9,280
9
43
57

How to extract substring from varible length column in pandas dataframe?

4 Answers4

Linked