2

I'm trying to remove parenthesis and all data within using Python 3.

I've looked into several different threads, including here:

How to remove parentheses and all data within using Pandas/Python?

After finally getting:

re.sub(r"\(.*\)|\s-\s.*", r"", str1)

to run without errors, it didn't remove the content from the str1 string.

Then I tried this approach:

How to remove text within parentheses from Python string?

to remove the parenthesis and contents from the file before reading it in and storing to str1 - but I get this error:

Traceback (most recent call last):

  File "sum_all.py", line 27, in <module>
    data.append(line.replace(match.group(),'').strip())
AttributeError: 'NoneType' object has no attribute 'group'

Here is the code, I'm obviously new at this and appreciate any help!!

# Python3 program to calculate sum of 
# all numbers present in a str1ing 
# containing alphanumeric characters 

# Function to calculate sum of all 
# numbers present in a str1ing 
# containing alphanumeric characters 
import re
import math
import pyperclip
import pandas
def find_sum(str1): 
    # Regular Expression that matches digits in between a string 
    return sum(map(int,re.findall('\d+',str1))) 

def find_sum2(str2): 
    # Regular Expression that matches digits where hr follows short for hours 
    return sum(map(int,re.findall('(\d+)hr',str1)))

str2=0

# Regular Expression 
data=[]
pattern=r'\(.+\)|\s\-.+'
with open('project.txt','r') as f:
    for line in f:
        match=re.search(pattern,line)
        data.append(line.replace(match.group(),'').strip())

print(data)

# input alphanumeric str1ing 
with open ("project.txt", "r") as myfile:
    str1=myfile.read().replace('\n', '')


# Regular Expression that removes (*) and Normalizes White Space - didn't work
#re.sub(r"\(.*\)|\s-\s.*", r"", str1)

# Regular Expression that removes (*) - didn't work
#re.sub(r"\(.*\)", r"", str1)

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
Stephen
  • 23
  • 3

1 Answers1

2

You can try this. r"\((.*?)\)"

The \( and \) say we want to target the actual parenthesis.

Then the parenthesis around the expression (.*?) say that we want to group what is inside.

Finally the .*? means that we want any character . and any repetition of that character *?.

s = "start (inside) this is in between (inside) end"
res = re.sub(r"\((.*?)\)", "", s)
print(res) 

prints

'start  this is in between  end'

Hope that helps.

  • Hi I tried that and it's not working when I try it. I get: "start (inside) this is in between (inside) end" I'm running a file inside mac terminal. – Stephen Oct 03 '19 at 20:45
  • It works brilliantly when I do (thanks so much): s = "start (inside) this is in between (inside) end" s = re.sub(r"\((.*?)\)", "", s) – Stephen Oct 04 '19 at 01:01
  • Strings are immutable objects in python [link](http://net-informations.com/python/iq/immutable.htm). So the function `re.sub(..., s)` can't change the value of `s` directly. That is why when you are expecting a string from a function it is very usual to get it as a return value. Such as `s2=re.seb(...., s)` – Gabriel Avendaño Oct 05 '19 at 02:36