0

i'm attempting to extract the word 'Here' as 'Here' contains a capital letter at beginning of word and occurs before word 'now'.

Here is my attempt based on regex from :

regex match preceding word but not word itself

import re
sentence = "this is now test Here now tester"
print(re.compile('\w+(?= +now\b)').match(sentence))

None is printed in above example.

Have I implemented regex correctly ?

blue-sky
  • 51,962
  • 152
  • 427
  • 752
  • 1
    Possible duplicate of [What is the difference between re.search and re.match?](https://stackoverflow.com/questions/180986/what-is-the-difference-between-re-search-and-re-match) – Aran-Fey Mar 29 '18 at 16:37
  • 1
    Use `re.compile(r"\w+(?= +now\b)")`. The backslashes are being interpreted literally. Adding `r` before the string makes the string raw. Also, change `match` to `findall` or, to only match `Here`, change the regex to `[A-Z][a-z]*(?= +now\b)` and use `search` – ctwheels Mar 29 '18 at 16:38
  • Actually, there's another mistake: you have to use a raw string literal for the regex. `r'\w+(?= +now\b)'` – Aran-Fey Mar 29 '18 at 16:39
  • Also, that'll find "is", not just "Here". – Aran-Fey Mar 29 '18 at 16:39
  • And more, the `\w` will not differentiate the letter case. – Wiktor Stribiżew Mar 29 '18 at 16:41
  • Ok, I'm retracting my duplicate vote because there's too much wrong with this code... – Aran-Fey Mar 29 '18 at 16:41

1 Answers1

4

The following works for the given example:

Regex:

re.search(r'\b[A-Z][a-z]+(?= now)', sentence).group()

Output:

'Here'

Explanation:

\b imposes word boundary

[A-Z] requires that word begins with capital letter

[a-z]+ followed by 1 or more lowercase letters (modify as necessary)

(?= now) positive look-ahead assertion to match now with leading whitespace

rahlf23
  • 8,869
  • 4
  • 24
  • 54