0

I have this word: word = 'ROTONDE04)'. I would like to clean it and keep only letters.

Therefore I tried:

>>> re.search('[^a-zA-Z].', word)
<re.Match object; span=(7, 9), match='04'>

Why is the parenthesis not returned by re.search? It does not belong to my class [^a-zA-Z].

I would like to avoid using re.sub.

Basile
  • 575
  • 1
  • 6
  • 13
  • This regex searches from the beginning of the string for any character that is not `a-z` or `A-Z` and then any other character (`.`). It matches on `04` and then stops searching. Can you explain why you thought it would do something different? – mkrieger1 Jan 21 '20 at 16:57
  • Does this answer your question? [How can I find all matches to a regular expression in Python?](https://stackoverflow.com/questions/4697882/how-can-i-find-all-matches-to-a-regular-expression-in-python) – mkrieger1 Jan 21 '20 at 16:59
  • Or this? [Python, remove all non-alphabet chars from string](https://stackoverflow.com/questions/22520932/python-remove-all-non-alphabet-chars-from-string) – mkrieger1 Jan 21 '20 at 17:00
  • The second link seems better, Why does it stop searching after `04` ? – Basile Jan 21 '20 at 17:10
  • 2
    Because that's what [`re.search` is supposed to do](https://docs.python.org/3/library/re.html#re.search). See the first link for alternatives. – mkrieger1 Jan 21 '20 at 17:19
  • Why are you willing to use `re.search` but not `re.sub`? – mkrieger1 Jan 21 '20 at 17:58

1 Answers1

1

You've specified two characters to match:

  1. Any character that isn't a letter ([^a-zA-Z]), immediately followed by
  2. Any character at all (.)

The first time in the string that this criteria is met is 04.

You may wish to match strings at least one character long that do not contain letters, in which case you want + instead of .:

>>> re.search('[^a-zA-Z]+', word)
<re.Match object; span=(7, 10), match='04)'>

The * character would be used instead of + if you wanted to match zero or more occurrences, instead of one or more occurrences. In this case, using * instead of + produces an empty string, as it matches at the very beginning.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • why using the the `*` returns an empty string ? Is it because it searches only a potential match at the begining ? – Basile Jan 21 '20 at 17:07
  • `re.search()` searches from the beginning of the string and finds the first satisfying occurrence it can. Since the empty string at the very beginning would satisfy `[^a-zA-Z]*`, it returns that. – Green Cloak Guy Jan 21 '20 at 17:59