0

My regex so far is (?<=_)[a-zA-Z0-9]+\b

The input text is:

Calculate the _area of the _perfectRectangle object.

The _id and _age variables are both integers.

__invalidVariable _evenMoreInvalidVariable_ _validVariable

The output should be:

area,perfectRectangle

id,age

validVariable

But instead is:

area,perfectRectangle

id,age

invalidVariable,validVariable

How to match strings that start with only 1 underscore?enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
prsnr
  • 83
  • 7

2 Answers2

1

You can assert an _ to the left that by itself is not preceded by a char:

(?<=_(?<!\S_))[a-zA-Z0-9]+\b

Regex demo | Python demo

import re
 
regex = r"(?<=_(?<!\S_))[a-zA-Z0-9]+\b"
 
s = ("Calculate the _area of the _perfectRectangle object.\n\n"
    "The _id and _age variables are both integers.\n\n"
    "__invalidVariable _evenMoreInvalidVariable_ _validVariable\n\n"
    "_validVariable_test"
    )
 
print(re.findall(regex, s))

Output

['area', 'perfectRectangle', 'id', 'age', 'validVariable']

Or with a capture group:

(?<!\S)_([a-zA-Z0-9]+)\b

Regex demo | Python demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Thank you for that, but I accidentally found that it's not working in every scenario. Example: _validVariable_test matches only _test and it must not match anything. Screenshot: https://i.ibb.co/LNZ16f9/image.png – prsnr Nov 14 '21 at 13:03
  • 1
    @prsnr Then you can use `(?<!\S)_([a-zA-Z0-9]+)\b` https://regex101.com/r/BpOKSv/1 – The fourth bird Nov 14 '21 at 13:18
1

You can try this: (demo)

\b_{1}([a-zA-Z0-9]+)\b

_{1} : one _

([...]) : get what in this [...]

Code for cheking:

import re
data= '''
Calculate the _area of the _perfectRectangle object.

The _id and _age variables are both integers.

__invalidVariable _evenMoreInvalidVariable_ _validVariable
'''

re.findall(r'\b_{1}([a-zA-Z0-9]+)\b', data)
# ['area', 'perfectRectangle', 'id', 'age', 'validVariable']
I'mahdi
  • 23,382
  • 5
  • 22
  • 30
  • 1
    Cool approach. I wonder how and why is this working when the string (_test) is at the beginning of the sentence? – prsnr Nov 14 '21 at 12:53
  • 1
    @prsnr thanks;) , edited answer, you can use `\b` instead of `\s` – I'mahdi Nov 14 '21 at 12:56
  • It's working both ways but I wonder why it's working with \s? \s means that it must match one space, right? But there is no space at the beginning of a sentence (at least not a visible one). https://i.ibb.co/8Y3WH9t/image.png – prsnr Nov 14 '21 at 13:01
  • 1
    @prsnr yes exactly with `?` in `\s?` you set whitespace is optional – I'mahdi Nov 14 '21 at 13:03