1

I thought [^0-9a-zA-Z]* excludes all alpha-numeric letters, but allows for special characters, spaces, etc.

With the search string [^0-9a-zA-Z]*ELL[^0-9A-Z]* I expect outputs such as

ELL 
ELLs 
The ELL 
Which ELLs

However I also get following outputs

Ellis Island
Bellis

How to correct this?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Rhonda
  • 1,661
  • 5
  • 31
  • 65
  • 2
    [Enclose the pattern with word boundaries](https://regex101.com/r/9lAaTj/1)? What are the requirements? – Wiktor Stribiżew Sep 11 '17 at 17:51
  • @WiktorStribiżew Regex should capture 'ELL' and 'ELLs' – Rhonda Sep 11 '17 at 19:11
  • Why should it *capture* these substrings? Why not just match? What is expected output? – Wiktor Stribiżew Sep 11 '17 at 19:40
  • @WiktorStribiżew It's part of python program that read a filename and assigns a category based on keywords. `ELL`, `ELLs`, `_ELL-`, `ELLs--` etc will get assigned a category. Sometimes there are non-alphanumeric characers around ELL and ELLs, which Regex should take into account. – Rhonda Sep 11 '17 at 19:44
  • @WiktorStribiżew Still struggling working on this, i.e. it doesn't capture 'ELLs` or `_ELLs` – Rhonda Sep 11 '17 at 19:49
  • Try [`(?:\b|_)ELLs?(?=\b|_)`](https://regex101.com/r/9lAaTj/3). It will find `ELL` or `ELLs` if it is surrounded with `_` or non-word chars, or at the start/end of the string. – Wiktor Stribiżew Sep 11 '17 at 21:30
  • @WiktorStribiżew It works! Thank you so much. If you put this in an answer I'll mark it as the solution. – Rhonda Sep 12 '17 at 12:49
  • Ok.............. – Wiktor Stribiżew Sep 12 '17 at 12:55

2 Answers2

1

change the * to +

a * means any amount including none. A + means one or more. What you probably want though is a word boundry:

\bELL\b

A word boundry is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ([0-9A-Za-z_]). More here about that: What is a word boundary in regexes?

sniperd
  • 5,124
  • 6
  • 28
  • 44
1

You may use

(?:\b|_)ELLs?(?=\b|_)

See the regex demo.

It will find ELL or ELLs if it is surrounded with _ or non-word chars, or at the start/end of the string.

Details:

  • (?:\b|_) - a non-capturing alternation group matching a word boundary position (\b) or (|) a _
  • ELLs? - matches ELL or ELLs since s? matches 1 or 0 s chars
  • (?=\b|_) - a positive lookahead that requires the presence of a word boundary or _ immediately to the right of the current location.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563