Replace string values of column if contained in parentheses

Question

I have the following dataframe as an example:

test = pd.DataFrame({'type':['fruit-of the-loom (sometimes-never)', 'yes', 'ok (not-possible) I will try', 'vegetable', 'poultry', 'poultry'],
                 'item':['apple', 'orange', 'spinach', 'potato', 'chicken', 'turkey']})

I found many posts of people wanting to remove parentheses from strings or similar situations, but in my case I would like to retain the string exactly as is, except I would like to remove the hyphen that is inside the parenthesis of the string.

Does anyone have a suggestion on how I could achieve this?

str.split() would take care of the hyphen if it was leading and str.rsplit() if it was trailing. I can't think of a way to engage this.

in this case the ideal outcome for the values in this hypothetical column would be:

'fruit-of the-loom (sometimes never)',
'yes', 
'ok (not possible) I will try', 
'vegetable', 
'poultry', 
'poultry'`

score 2 · Accepted Answer · answered Apr 13 '20 at 17:14

One way could be to use str.replace with a pattern looking for what is between parenthesis, and the replace parameter could be a lambda using replace on the matching object:

print (test['type'].str.replace(pat='\((.*?)\)', 
                                repl=lambda x: x.group(0).replace('-',' ')))
0    fruit-of the-loom (sometimes never)
1                                    yes
2           ok (not possible) I will try
3                              vegetable
4                                poultry
5                                poultry
Name: type, dtype: object

Explanation of what is in pat= can be found here

Josh Friedlander · Answer 2 · 2020-04-13T19:22:40.810

test.type = (test.type.str.extract('(.*?\(.*?)-(.*?\))(.*)')
             .sum(1)
             .combine_first(test.type))

Explanation:

Extract regex groups of beginning until parenthesis and then hyphen and after hyphen until parenthesis and then optional additional stuff
Concatenate them together again with sum
Where, NaN, use the values from the original (combine_first)

This way the hyphen is dropped, not replaced by a space. If you need a space you could use apply instead of sum:

test.type = (test.type.str.extract('(.*?\(.*?)-(.*?\))(.*)')
             .apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
             .combine_first(test.type))

Either way, this won't work for more than one set of parentheses.

score 0 · Answer 3 · answered Apr 13 '20 at 17:13

I should have taken a little longer to think about this one.

This is the solution I came up with"

count parenthesis, replace what is within proper count

def inside_parens(string):
    parens_count = 0
    return_string = ""
    for a in string:
        if a == "(":
            parens_count += 1
        elif a == ")":
            parens_count -= 1
        if parens_count > 0:
            return_string += a.replace('-', ' ')
        else:
            return_string += a
    return return_string


    return return_string

Once this is done apply it to the intended column:

df['col_1'] = df['col_1'].apply(inside_parens)

If you want to generalize the function you can actually just pass what you want to replace and make it more versatile.

Replace string values of column if contained in parentheses

3 Answers3