-2

file1.txt contains:

Thailand,[a] officially the Kingdom of Thailand and formerly known as Siam,[b] is a country in Southeast Asia.

I want to delete the words between [] and (). The expected output is:

Thailand, officially the Kingdom of Thailand and formerly known as Siam, is a country in Southeast Asia.

This is my code:

with open('file1.txt') as file1:
    file1 = file1.read()
test = re.sub(r'[\(\[].*[\)\]]', '', file1)

My code deletes all the words between [a] and [b]. The example output:

Thailand is a country in Southeast Asia.
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
goodKarma
  • 1
  • 2

1 Answers1

0

When you use [.*] it does a greedy match, so everything from [a till b] is matched and substituted for the empty string ''.

When you use [.?], it matches anychar . zero or 1 time ? which are inside []. And so [a] and [b] are matched.

import re

with open('file1.txt') as file1:
    file1 = file1.read()
test = re.sub(r'[\(\[].?[\)\]]', '', file1)

print(test)
GMaster
  • 1,431
  • 1
  • 16
  • 27