How to remove text within parentheses from Python string?

Question

I am trying to remove parentheses and the text that resides in these parentheses, as well as hyphen characters. Some string examples look like the following:
example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
example2 = 'Year 2-7 Q4.8 - Data markets and phases' ##there are two hyphens

I would like the results to be:

example = 'Year 1.2 Q4.1'  
example2 = 'Year 2-7 Q4.8'

How can I remove text residing within or following parentheses and special characters? I could only find str.strip() method. I am new to Python, so any feedback is greatly appreciated!

There are many ways. You should have a look at doing it with regex. I tagged it with regex and soon the regex sharks will be here. — Anton vBR, Dec 27 '17 at 19:33
Possible duplicate of [Python: Split string by list of separators](https://stackoverflow.com/questions/4697006/python-split-string-by-list-of-separators) — splash58, Dec 27 '17 at 19:34

Moinuddin Quadri · Answer 1 · 2018-01-01T14:21:58.783

You may use below regex to get the desired result:

"\(.*\)|\s-\s.*"
#   ^     ^  Pattern 2: everything followed by space, '-' hyphen, space
#   ^   Pattern 1: everything within brackets (....)

Sample run:

>>> import re
>>> my_regex = "\(.*\)|\s-\s.*"

>>> example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
>>> example2 = 'Year 2-7 Q4.8 - Data markets and phases'

>>> re.sub(my_regex, "", example)
'Year 1.2 Q4.1'
>>> re.sub(my_regex, "", example2)
'Year 2-7 Q4.8'

Here I am using re.sub(pattern, repl, string, ...) which as the document says:

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed.

score 1 · Answer 2 · answered Dec 27 '17 at 20:09

We can do this using a * and a throwaway variable.

example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
display,*_ = example.split('(')
print(display)

example2 = 'Year 2-7 Q4.8 - Data markets and phases' ##there are two hyphens
part_1,part_2,*_ = example2.split('-')
display = part_1 + '-'+ part_2
print(display)

score 1 · Answer 3 · answered Dec 28 '17 at 14:21

You can try something like this , you will need little data cleaning after you fetch result to make it as your desired output:

import re
data=[]
pattern=r'\(.+\)|\s\-.+'
with open('file.txt','r') as f:
    for line in f:
        match=re.search(pattern,line)
        data.append(line.replace(match.group(),'').strip())

print(data)

score 0 · Answer 4 · answered Dec 27 '17 at 19:45

Here is an example without regex (just to show you have good regex can be):

The code adds strings until a string starts with Q:

example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'

def clean_string(s):
    for item in s.split():
        yield item
        if item.startswith('Q'):
            break

print(' '.join(clean_string(example)))

How to remove text within parentheses from Python string?

4 Answers4

Linked