1

I would like to write a regex that matches the word hello but only when it either starts a line or is preceded by whitespace. I don't want to match the whitespace if its there...I just need to know it (or the start of line) is there.

So I've tried:

r = re.compile('hello(?<=\s|^)')

but this throws:

error: look-behind requires fixed-width pattern

For the sake of an example, if my string to be searched is:

s = 'hello world hello thello'

then I would like my regex to match two times...at the locations in uppercase below:

'HELLO world HELLO thello'

where the first would match because it is preceded by the start of the line, while the second match would be because it is preceded by a space. The last 5 characters would not match because they are preceded by a t.

8one6
  • 13,078
  • 12
  • 62
  • 84

1 Answers1

5

(?:(?<=\s)|^)hello would be that which you want. The lookbehind needs to be in the beginning of regular expression; and it must indeed be of fixed width - \s is 1 character wide, whereas ^ is 0 characters, so you cannot combine them with |. In this case we do not need to, we just alternate (?<=\s) and ^.

Notice that both of these would still match hellooo; if this is not acceptable, you have to add \b at the end.