How to remove parentheses and all data within using Python3

Question

I'm trying to remove parenthesis and all data within using Python 3.

I've looked into several different threads, including here:

How to remove parentheses and all data within using Pandas/Python?

After finally getting:

re.sub(r"\(.*\)|\s-\s.*", r"", str1)

to run without errors, it didn't remove the content from the str1 string.

Then I tried this approach:

How to remove text within parentheses from Python string?

to remove the parenthesis and contents from the file before reading it in and storing to str1 - but I get this error:

Traceback (most recent call last):

  File "sum_all.py", line 27, in <module>
    data.append(line.replace(match.group(),'').strip())
AttributeError: 'NoneType' object has no attribute 'group'

Here is the code, I'm obviously new at this and appreciate any help!!

# Python3 program to calculate sum of 
# all numbers present in a str1ing 
# containing alphanumeric characters 

# Function to calculate sum of all 
# numbers present in a str1ing 
# containing alphanumeric characters 
import re
import math
import pyperclip
import pandas
def find_sum(str1): 
    # Regular Expression that matches digits in between a string 
    return sum(map(int,re.findall('\d+',str1))) 

def find_sum2(str2): 
    # Regular Expression that matches digits where hr follows short for hours 
    return sum(map(int,re.findall('(\d+)hr',str1)))

str2=0

# Regular Expression 
data=[]
pattern=r'\(.+\)|\s\-.+'
with open('project.txt','r') as f:
    for line in f:
        match=re.search(pattern,line)
        data.append(line.replace(match.group(),'').strip())

print(data)

# input alphanumeric str1ing 
with open ("project.txt", "r") as myfile:
    str1=myfile.read().replace('\n', '')


# Regular Expression that removes (*) and Normalizes White Space - didn't work
#re.sub(r"\(.*\)|\s-\s.*", r"", str1)

# Regular Expression that removes (*) - didn't work
#re.sub(r"\(.*\)", r"", str1)

Gabriel Avendaño · Accepted Answer · 2019-10-05T02:41:56.423

2

You can try this. r"\((.*?)\)"

The \( and \) say we want to target the actual parenthesis.

Then the parenthesis around the expression (.*?) say that we want to group what is inside.

Finally the .*? means that we want any character . and any repetition of that character *?.

s = "start (inside) this is in between (inside) end"
res = re.sub(r"\((.*?)\)", "", s)
print(res)

prints

'start  this is in between  end'

Hope that helps.

edited Oct 05 '19 at 02:41

answered Oct 02 '19 at 20:56

Gabriel Avendaño

208
1
9

Hi I tried that and it's not working when I try it. I get: "start (inside) this is in between (inside) end" I'm running a file inside mac terminal. – Stephen Oct 03 '19 at 20:45
It works brilliantly when I do (thanks so much): s = "start (inside) this is in between (inside) end" s = re.sub(r"\((.*?)\)", "", s) – Stephen Oct 04 '19 at 01:01
Strings are immutable objects in python [link](http://net-informations.com/python/iq/immutable.htm). So the function `re.sub(..., s)` can't change the value of `s` directly. That is why when you are expecting a string from a function it is very usual to get it as a return value. Such as `s2=re.seb(...., s)` – Gabriel Avendaño Oct 05 '19 at 02:36

How to remove parentheses and all data within using Python3

1 Answers1

Linked