0

I am trying to remove parentheses and the text that resides in these parentheses, as well as hyphen characters. Some string examples look like the following:
example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
example2 = 'Year 2-7 Q4.8 - Data markets and phases' ##there are two hyphens

I would like the results to be:

example = 'Year 1.2 Q4.1'  
example2 = 'Year 2-7 Q4.8'  

How can I remove text residing within or following parentheses and special characters? I could only find str.strip() method. I am new to Python, so any feedback is greatly appreciated!

Anton vBR
  • 18,287
  • 5
  • 40
  • 46
CPU
  • 267
  • 1
  • 6
  • 16
  • 2
    There are many ways. You should have a look at doing it with regex. I tagged it with regex and soon the regex sharks will be here. – Anton vBR Dec 27 '17 at 19:33
  • 1
    Possible duplicate of [Python: Split string by list of separators](https://stackoverflow.com/questions/4697006/python-split-string-by-list-of-separators) – splash58 Dec 27 '17 at 19:34
  • 1
    @AntonvBR lol. The regex sharks are circling the waters – Brad Solomon Dec 27 '17 at 19:39

4 Answers4

6

You may use below regex to get the desired result:

"\(.*\)|\s-\s.*"
#   ^     ^  Pattern 2: everything followed by space, '-' hyphen, space
#   ^   Pattern 1: everything within brackets (....)

Sample run:

>>> import re
>>> my_regex = "\(.*\)|\s-\s.*"

>>> example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
>>> example2 = 'Year 2-7 Q4.8 - Data markets and phases'

>>> re.sub(my_regex, "", example)
'Year 1.2 Q4.1'
>>> re.sub(my_regex, "", example2)
'Year 2-7 Q4.8'

Here I am using re.sub(pattern, repl, string, ...) which as the document says:

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed.

Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
1

We can do this using a * and a throwaway variable.

example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
display,*_ = example.split('(')
print(display)

example2 = 'Year 2-7 Q4.8 - Data markets and phases' ##there are two hyphens
part_1,part_2,*_ = example2.split('-')
display = part_1 + '-'+ part_2
print(display)
theSekyi
  • 462
  • 2
  • 6
  • 23
1

You can try something like this , you will need little data cleaning after you fetch result to make it as your desired output:

import re
data=[]
pattern=r'\(.+\)|\s\-.+'
with open('file.txt','r') as f:
    for line in f:
        match=re.search(pattern,line)
        data.append(line.replace(match.group(),'').strip())

print(data)
Aaditya Ura
  • 12,007
  • 7
  • 50
  • 88
0

Here is an example without regex (just to show you have good regex can be):

The code adds strings until a string starts with Q:

example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'

def clean_string(s):
    for item in s.split():
        yield item
        if item.startswith('Q'):
            break

print(' '.join(clean_string(example)))
Anton vBR
  • 18,287
  • 5
  • 40
  • 46