0

I'm interested by removing extra symbols from strings in python.

What could by the more efficient and pythonic way to do that ? Is there some grammar module ?

My first idea would be to locate the more nested text and go through the left and the right, counting the opening and closing symbols. Then i remove the last one of the symbol counter that contain too much symbol.

An example would be this string

text = "(This (is an example)"

You can clearly see that the first parenthesis is not balanced by another one. So i want to delete it.

text = "This (is and example)"

The solution has to be independant of the position of the parentheses.

Others example could be :

text = "(This (is another example) )) (to) explain) the question"

That would become :

text = "(This (is another example) ) (to) explain the question"
qdelettre
  • 1,873
  • 5
  • 25
  • 36

1 Answers1

0

Had to break this into an answer for formatting. Check the Python's regular expression module.

If I'm understanding what you are asking, look at re.sub. You can use a regular expression to find the character you'd like to remove, and replace them with an empty string.

Suppose we want to remove all instances of '.', '&', and '*'.

>>> import re
>>> s = "abc&def.ghi**jkl&"
>>> re.sub('[\.\&\*]', '', s)
'abcdefghijkl'

If the pattern to be matched is larger, you can use re.compile and pass that as the first argument to sub.

>>> r = re.compile('[\.\&\*]')
>>> re.sub(r, '', s)
'abcdefghijkl'

Hope this helps.

oz123
  • 27,559
  • 27
  • 125
  • 187
Jamie Duby
  • 76
  • 3