3

I would like to delete all text after the 2nd comma to the left of strings in a dataframe that include "County, Texas". For example,

Before:

  1. "Jack Smith, Bank, Wilber, Lincoln County, Texas"
  2. "Jack Smith, Bank, Credit, Bank, Wilber, Lincoln County, Texas"
  3. "Jack Smith, Bank, Union, Credit, Bank, Wilber, Lincoln County, Texas, Branch, Landing, Services"
  4. "Jack Smith, Bank, Credit, Bank, Wilber, Branch, Landing, Services"

After:

  1. "Jack Smith, Bank"
  2. "Jack Smith, Bank"
  3. "Jack Smith, Bank, Union"
  4. "Jack Smith, Bank, Credit, Bank, Wilber, Branch, Landing, Services"

Thank you for your help!

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Andrew
  • 73
  • 1
  • 9

3 Answers3

4

Use mask with str.contains() to perform the operation on rows with the specified condition, and then use the following operation: .str.split(', ').str[0:2].agg(', '.join)):

df['Col'] = df['Col'].mask(df['Col'].str.contains('County, Texas'),
                           df['Col'].str.split(', ').str[0:2].agg(', '.join))

Full Code:

import pandas as pd
df = pd.DataFrame({'Col': {0: 'Jack Smith, Bank, Wilber, Lincoln County, Texas',
  1: 'Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas',
  2: 'Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas, Branch, Landing, Services',
  3: 'Jack Smith, Union, Credit, Bank, Wilber, Branch, Landing, Services'}})
df['Col'] = df['Col'].mask(df['Col'].str.contains('County, Texas'),
                           df['Col'].str.split(', ').str[0:2].agg(', '.join))                            
df
Out[1]: 
                                                 Col
0                                   Jack Smith, Bank
1                                  Jack Smith, Union
2                                  Jack Smith, Union
3  Jack Smith, Union, Credit, Bank, Wilber, Branc...

Per the updated question, you can use np.select:

import pandas as pd
df = pd.DataFrame({'Col': {0: 'Jack Smith, Bank, Wilber, Lincoln County, Texas',
  1: 'Jack Smith, Bank, Credit, Bank, Wilber, Lincoln County, Texas',
  2: 'Jack Smith, Bank, Union, Credit, Bank, Wilber, Lincoln County, Texas, Branch, Landing, Services',
  3: 'Jack Smith, Bank, Credit, Bank, Wilber, Branch, Landing, Services'}})
df['Col'] = np.select([df['Col'].str.contains('County, Texas') & ~df['Col'].str.contains('Union'),
                       df['Col'].str.contains('County, Texas') & df['Col'].str.contains('Union')],
                      [df['Col'].str.split(', ').str[0:2].agg(', '.join),
                       df['Col'].str.split(', ').str[0:3].agg(', '.join)],
                       df['Col'])                            
df
Out[2]: 
                                                 Col
0                                   Jack Smith, Bank
1                                   Jack Smith, Bank
2                            Jack Smith, Bank, Union
3  Jack Smith, Bank, Credit, Bank, Wilber, Branch...
David Erickson
  • 16,433
  • 2
  • 19
  • 35
  • Thank you David, but how do I keep the after the 2nd comma in the 3rd case I updated? (I.E. so the word "union" is kept? – Andrew Nov 07 '20 at 00:06
  • @Andrew see updated answer. In future, kindly post a new question referencing this one, as changing the initial question can entirely change the solution. Thank you! – David Erickson Nov 07 '20 at 00:30
2

You can simply use a combination of map with a lambda, split and join:

df['Example'] = df['Example'].map(lambda x: ','.join(x.split(',')[0:2]) if 'County, Texas' in x else x)

In this case:

import pandas as pd
df = pd.DataFrame({'Example':["Jack Smith, Bank, Wilber, Lincoln County, Texas","Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas",
                              "Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas, Branch, Landing, Services",
                              "Jack Smith, Union, Credit, Bank, Wilber, Branch, Landing, Services"]})
df['Example'] = df['Example'].map(lambda x: ','.join(x.split(',')[0:2]) if 'County, Texas' in x else x)

We get the following output:

                                             Example
0                                   Jack Smith, Bank
1                                  Jack Smith, Union
2                                  Jack Smith, Union
3  Jack Smith, Union, Credit, Bank, Wilber, Branc...
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
1

Data

df = pd.DataFrame({'text':["Jack Smith, Bank, Wilber, Lincoln County, Texas","Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas",
                              "Jack Smith, Union, Credit, Bank, Wilber, Lincoln County, Texas, Branch, Landing, Services",
                              "Jack Smith, Union, Credit, Bank, Wilber, Branch, Landing, Services"]})

Solution; Use .str.extract

df['newtext']=df.text.str.extract('(^\w+\s\w+\,\s\w+)')



                                           text            newtext
0    Jack Smith, Bank, Wilber, Lincoln County, Texas   Jack Smith, Bank
1  Jack Smith, Union, Credit, Bank, Wilber, Linco...  Jack Smith, Union
2  Jack Smith, Union, Credit, Bank, Wilber, Linco...  Jack Smith, Union
3  Jack Smith, Union, Credit, Bank, Wilber, Branc...  Jack Smith, Union
wwnde
  • 26,119
  • 6
  • 18
  • 32