0

I am trying to remove some text from a string. What I want to remove could be any of the examples listed below. Basically any combination of uppercase and lowercase, any combination of integers at the end, and any combination of letters at the end. There could also be a space between or not.

  • (Disk 1)
  • (Disk 5)
  • (Disc2)
  • (Disk 10)
  • (Part A)
  • (Pt B)
  • (disk a)
  • (CD 7)
  • (cD X)

I have a method already to get the beginning "(type"

multi_disk_search = [ '(disk', '(disc', '(part', '(pt', '(prt' ]
if any(mds in fileName.lower() for mds in multi_disk_search): #https://stackoverflow.com/a/3389611
  for mds in multi_disk_search:
    if mds in fileName.lower():
      print(mds)
      break

That returns (disc for example.

I cannot just split by the parenthesis because there could be other tags in other parenthesis. Also there is no specific order to the tags. The one I am searching for is typically last; however many times it is not.

I think the solution will require regex, but I'm really lost when it comes to that.

I tried this, but it returns something that doesn't make any sense to me.

regex = re.compile(r"\s*\%s\s*" % (mds), flags=re.I) #https://stackoverflow.com/a/20782251/11214013
regex.split(fileName)
newName = regex
print(newName)

Which returns re.compile('\\s*\\(disc\\s*', re.IGNORECASE)

What are some ways to solve this?

2 Answers2

0

Perhaps something like this:

rx = re.compile(r'''
    \(
     (?: dis[ck] | p(?:a?r)?t )
     [ ]?
     (?: [a-z]+ | [0-9]+ )
     \)''', re.I | re.X)

This pattern uses only basic syntax of regex pattern except eventually the X flag, the Verbose mode (with this one any blank character is ignored in the pattern except when it is escaped or inside a character class). Feel free to read the python manual about the re module. Adding support for CD is let as an exercise.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
0
>>> import re
>>> def remove_parens(s,multi_disk_search):
...     mds = '|'.join([re.escape(x) for x in multi_disk_search])
...     return re.sub(f'\((?:{mds})[0-9A-Za-z ]*\)','',s,0,re.I)
...

>>> multi_disk_search = ['disk','cd','disc','part','pt']
>>> remove_parens('this is a (disc a) string with (123xyz) parens removed',multi_disk_search)
'this is a  string with (123xyz) parens removed'
Matt Miguel
  • 1,325
  • 3
  • 6