2

For example, how to match the second _ab in the sentence _ab_ab is a test? I tried \> to match end of word, but not work for Python 2.7. Note: I am matching not end of a string, but end of a single word.

There are implicit answers in other posts. But I believe a simple and direct answer to such question should be advocated. So I ask it after trying the following posts without direct & concise solutions found.

  1. Python Regex to find whitespace, end of string, and/or word boundary

  2. Does Python re module support word boundaries (\b)?

Community
  • 1
  • 1
DehengYe
  • 619
  • 2
  • 8
  • 22
  • I wonder what is so unclear in [this SO post](http://stackoverflow.com/a/3995061/3832970) that you need a specific question for it. Word boundary stands for word start and word end. Depending on where you place it, it will act as either word start marker or word end marker. I would close the question as duplicate, but I will let others decide. – Wiktor Stribiżew Aug 10 '15 at 06:48
  • @stribizhev Yes, that answer might cover implicit answers. What I think is, SO is a knowledge community for everyone, especially novice developers, who need simple answers to some basic programming questions. I searched and researched in SO with query "python end of word matching", but no **direct and concise answer** found for such kind of question. I want our knowledge community to be better, so I asked this question, answered it by myself first, then I called for discussions. – DehengYe Aug 10 '15 at 06:56
  • @stribizhev I intended to say a combination of characters as a word. Your comments are very valid. Could you help to edit some descriptions of the question that cause ambiguities? – DehengYe Aug 10 '15 at 07:04
  • Given the sample input is a generalized problem you have, there is no need modifying anything. – Wiktor Stribiżew Aug 10 '15 at 07:09

4 Answers4

9

You may use word boundary \b at the last. Note that adding \b before _ab won't work because there is a b (word char) exists before underscore. \b matches between a word character and a non-word character(vice-versa).

r'_ab\b'
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
4
_ab(?!\w) #if you want `_` as word character

or

_ab(?![a-zA-Z0-9]) 

You can simply use lookahead to indicate end of word.

import re
p = re.compile(r'_ab(?!\w)') #consider underscore also as a word character. 

or

p = re.compile(r'_ab(?![a-zA-Z0-9])')

test_str = "_ab_ab"

re.findall(p, test_str)
vks
  • 67,027
  • 10
  • 91
  • 124
2

use r'\>' rather than just '\>'.

I find this solution after reading this post: https://stackoverflow.com/a/3995242/2728388

When using the re module in Python, remember Python’s raw string notation, add a r prefix to escape backslash in your regular expressions.

Any other solutions, such as using word boundary \b?

Community
  • 1
  • 1
DehengYe
  • 619
  • 2
  • 8
  • 22
0
import re
string='''ab_ab _ab_ab ab__ab abab_ ab_ababab_ '''
patt=re.compile(r'_ab\b')
#this will search _ab from the back of the string
allmatches=patt.findall(patt,string)
print(allmatches)

this will match all _ab form the back of the string

Shreyansh Gupta
  • 382
  • 1
  • 10