-2

I'm trying to find a way to detect and remove characters within a range of characters inside a string. Tried looping around and not much of a success and now experimenting for Regex.

So I'm supposed to input a filename e.g. [1080p]Godzilla.subs.mp4 or JohnnyEnglish_720[EnglishSubs].mp4 or [x264]psa_recording[1270x720].mp4

I'm supposed to remove all characters within the [] and output Godzilla.subs.mp4 or JohnnyEnglish_720.mp4 or psa_recording.mp4`

import re
loop = True
list = []
while loop:
    file_name = input("Filename?")
    if file_name == '':
        print(", ".join(list))
        loop = False
    else:
        file_name = re.sub(r'[\[\[].*[\]\]]', '', file_name)
        list.append(file_name)

It doesn't seems to be working for inputs that consists of more than 1 "[]"

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • Welcome to SO! Check out the [tour]. What's your question? It seems like the problem is straightforward: just replace the parentheses `()` in the regex with brackets `[]` (which I just tried and it worked perfectly). So what do you need help with? Please [edit] to clarify. See [ask] if you want more tips. – wjandrea Apr 12 '21 at 13:53
  • 1
    Hi @wjandrea, thank you for your input, however it seems that just changing the () to [] only works once, i just found out that it doesn't apply to file names that consists of more than 1 "[]" e.g. [x264]psa_recording[1270x720].mp4 I've gotten the result .mp4 instead. Here's what i changed. file_name = re.sub(r'[\[\[].*[\]\]]', '', file_name) – Jonathan Chee Apr 12 '21 at 13:56

4 Answers4

4

You could use a regular expression. The pattern could be:

\[.*?\]

  • \[ for an opening square bracket.
  • . to match any character...
  • * ... zero or more times.
  • ? to make the pattern not greedy. This will prevent substrings that include multiple pairs of square brackets like [1080p]Godzilla[hello] from being treated as one match.
  • \] for a closing square bracket.

We can use re.sub to replace all matches with an empty string, effectively removing them:

import re

file_names = [
    "[1080p]Godzilla.subs.mp4",
    "JohnnyEnglish_720[EnglishSubs].mp4"
]

pattern = r"\[.*?\]"

for file_name in file_names:
    new_file_name = re.sub(pattern, "", file_name)
    print(new_file_name)

Output:

Godzilla.subs.mp4
JohnnyEnglish_720.mp4
Paul M.
  • 10,481
  • 2
  • 9
  • 15
0

I think you're overcomplicating your regex. I think in this case a simple one like the following should work.

>>> fn = "hello[asdf].txt"
>>> re.sub(r'\[.*?\]', '', fn)
'hello.txt'

EDIT: This also works for more than one [square-bracketed-thing]

>>> fn = "[x264]psa_recording[1270x720].mp4"
>>> re.sub(r'\[.*?\]', '', fn)
'psa_recording.mp4'

Rory Browne
  • 627
  • 1
  • 5
  • 11
0

You can try something like this:

filenames = ["[1080p]Godzilla.subs.mp4",
             "JohnnyEnglish_720[EnglishSubs].mp4",
             "bla bla [test] blo blo [foo] bar.mp4"]
out = []
for name in filenames:
    while '[' in name and ']' in name:  # while loop for multiple ocurrences
        name = name[:name.find('[')] + name[name.find(']')+1:]
    out.append(name)

print(out)

Outputs:

['Godzilla.subs.mp4', 'JohnnyEnglish_720.mp4', 'bla bla  blo blo  bar.mp4']
Martí
  • 571
  • 6
  • 17
0

if you want the simplest solution, then just find the index of [ and ] in your string and replace part between those indexes Check the below example

s = "[x264]psa_recording[1270x720][abc].mp4"
for i in range(s.count('[')):
    start_index = s.index('[')
    end_index = s.index(']')
    s = s.replace(s[start_index:end_index+1],'')
print(s)

output will be

psa_recording.mp4
Amit Nanaware
  • 3,203
  • 1
  • 6
  • 19