Python: How to delete all text after 2nd comma to left of string

Question

I would like to delete all text after the 2nd comma to the left of strings in a dataframe that include "County, Texas". For example,

Before:

"Jack Smith, Bank, Wilber, Lincoln County, Texas"
"Jack Smith, Bank, Credit, Bank, Wilber, Lincoln County, Texas"
"Jack Smith, Bank, Union, Credit, Bank, Wilber, Lincoln County, Texas, Branch, Landing, Services"
"Jack Smith, Bank, Credit, Bank, Wilber, Branch, Landing, Services"

After:

"Jack Smith, Bank"
"Jack Smith, Bank"
"Jack Smith, Bank, Union"
"Jack Smith, Bank, Credit, Bank, Wilber, Branch, Landing, Services"

Thank you for your help!

Could you please provide more information regarding the dataframe, and make a MVE please? — Celius Stingher, Nov 06 '20 at 23:25
@CeliusStingher It's pretty clear from the Before/After, is there something else you have in mind? — user1717828, Nov 06 '20 at 23:37
i'M TRYING TO UNDERSTAND WHY THE 2 AND 3 ROWS SHOW "Jack Smith, Bank" instead of "Jack Smith, Union". A MVE would help solve it. — Celius Stingher, Nov 06 '20 at 23:38
By Regex an idea to search for [`^([^,]*,[^,]*),.*County, Texas.*`](https://regex101.com/r/V51por/2) and replace with `\1` capture of *group(1)* — bobble bubble, Nov 07 '20 at 09:43

David Erickson · Accepted Answer · 2020-11-07T00:28:49.103

Use mask with str.contains() to perform the operation on rows with the specified condition, and then use the following operation: .str.split(', ').str[0:2].agg(', '.join)):

df['Col'] = df['Col'].mask(df['Col'].str.contains('County, Texas'),
                           df['Col'].str.split(', ').str[0:2].agg(', '.join))

Full Code:

import pandas as pd
df = pd.DataFrame({'Col': {0: 'Jack Smith, Bank, Wilber, Lincoln County, Texas',
  1: 'Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas',
  2: 'Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas, Branch, Landing, Services',
  3: 'Jack Smith, Union, Credit, Bank, Wilber, Branch, Landing, Services'}})
df['Col'] = df['Col'].mask(df['Col'].str.contains('County, Texas'),
                           df['Col'].str.split(', ').str[0:2].agg(', '.join))                            
df
Out[1]: 
                                                 Col
0                                   Jack Smith, Bank
1                                  Jack Smith, Union
2                                  Jack Smith, Union
3  Jack Smith, Union, Credit, Bank, Wilber, Branc...

Per the updated question, you can use np.select:

import pandas as pd
df = pd.DataFrame({'Col': {0: 'Jack Smith, Bank, Wilber, Lincoln County, Texas',
  1: 'Jack Smith, Bank, Credit, Bank, Wilber, Lincoln County, Texas',
  2: 'Jack Smith, Bank, Union, Credit, Bank, Wilber, Lincoln County, Texas, Branch, Landing, Services',
  3: 'Jack Smith, Bank, Credit, Bank, Wilber, Branch, Landing, Services'}})
df['Col'] = np.select([df['Col'].str.contains('County, Texas') & ~df['Col'].str.contains('Union'),
                       df['Col'].str.contains('County, Texas') & df['Col'].str.contains('Union')],
                      [df['Col'].str.split(', ').str[0:2].agg(', '.join),
                       df['Col'].str.split(', ').str[0:3].agg(', '.join)],
                       df['Col'])                            
df
Out[2]: 
                                                 Col
0                                   Jack Smith, Bank
1                                   Jack Smith, Bank
2                            Jack Smith, Bank, Union
3  Jack Smith, Bank, Credit, Bank, Wilber, Branch...

Thank you David, but how do I keep the after the 2nd comma in the 3rd case I updated? (I.E. so the word "union" is kept? — Andrew, Nov 07 '20 at 00:06
@Andrew see updated answer. In future, kindly post a new question referencing this one, as changing the initial question can entirely change the solution. Thank you! — David Erickson, Nov 07 '20 at 00:30

score 2 · Answer 2 · answered Nov 06 '20 at 23:31

You can simply use a combination of map with a lambda, split and join:

df['Example'] = df['Example'].map(lambda x: ','.join(x.split(',')[0:2]) if 'County, Texas' in x else x)

In this case:

import pandas as pd
df = pd.DataFrame({'Example':["Jack Smith, Bank, Wilber, Lincoln County, Texas","Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas",
                              "Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas, Branch, Landing, Services",
                              "Jack Smith, Union, Credit, Bank, Wilber, Branch, Landing, Services"]})
df['Example'] = df['Example'].map(lambda x: ','.join(x.split(',')[0:2]) if 'County, Texas' in x else x)

We get the following output:

                                             Example
0                                   Jack Smith, Bank
1                                  Jack Smith, Union
2                                  Jack Smith, Union
3  Jack Smith, Union, Credit, Bank, Wilber, Branc...

score 1 · Answer 3 · answered Nov 06 '20 at 23:49

Data

df = pd.DataFrame({'text':["Jack Smith, Bank, Wilber, Lincoln County, Texas","Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas",
                              "Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas, Branch, Landing, Services",
                              "Jack Smith, Union, Credit, Bank, Wilber, Branch, Landing, Services"]})

Solution; Use .str.extract

df['newtext']=df.text.str.extract('(^\w+\s\w+\,\s\w+)')



                                           text            newtext
0    Jack Smith, Bank, Wilber, Lincoln County, Texas   Jack Smith, Bank
1  Jack Smith, Union, Credit, Bank, Wilber, Linco...  Jack Smith, Union
2  Jack Smith, Union, Credit, Bank, Wilber, Linco...  Jack Smith, Union
3  Jack Smith, Union, Credit, Bank, Wilber, Branc...  Jack Smith, Union

Python: How to delete all text after 2nd comma to left of string

3 Answers3