0

If I knew the name of the Regex feature I needed, I'd have a better title.

As a validation task, I need to verify that a text stream only contains sections matching a Regex pattern I have established, regardless of how many times it occurs. For example, if my pattern is "foo" and I have the string

 "foo foofoo" 

The result should be three matches and no other non-matching text. Contrarily, the string

"foo foo fooo"  

Should return three matches, but I need to detect that a remaining printable character 'o' was not matched. My first though was to use the pipe character for 'or' logic like "(?:foo)|(\S)", and I thought I had it sorted, but the string

"-foo foo foo" 

only matches twice. It appears that the leading character causes the engine to skip to the right side of the expression and since it's broadly defined, it captures until the next word break. Clearly my mental representation does not reflect how the engine is operating. Where am I going wrong?

JSacksteder
  • 780
  • 2
  • 7
  • 21
  • Do you need the actual number of matches, or just to detect that there was non-matching non-whitespace text? – Steve K Mar 05 '15 at 02:17
  • I presume this will need to be two distinct regex operations. First a test to verify that there's nothing left over after matching, then a second cal to do the match and extract the data. This is sort of like set-based 'except' logic. – JSacksteder Mar 05 '15 at 14:53
  • This question is related, but not a solution for me yet. http://stackoverflow.com/questions/406230/regular-expression-to-match-string-not-containing-a-word?rq=1 – JSacksteder Mar 05 '15 at 14:54

3 Answers3

0

You want to search for "foo", and any characters after it, up until the next foo. In order to do that you need a negative lookahead assertion (to check for 'something not followed by something else'). So your regex is going to be "foo.*?(?!foo)"

Steve K
  • 4,863
  • 2
  • 32
  • 41
  • That's a better pattern than my example, but I need the inverse of that - the set of text excluded by that pattern. That's why I think the pipe alternation character needs to be involved. – JSacksteder Mar 05 '15 at 14:33
  • I'm not aware of any way to find only the bits that don't match a subpattern using only a regex. Processing a string to do what you're requesting is easy in any programming language (you just do a global replace on `foo` with an empty string and are left with the remaining non-matching characters). – Steve K Mar 05 '15 at 21:44
0

Seems like you want something like this,

(?<![^\w\s])foo

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

You can do:

\b(?:foo)\b|\b(?:foo)+\b

Demo

Perhaps better:

\b(?:foo)\b|\b(?:foo){2,}\b

Demo 2

dawg
  • 98,345
  • 23
  • 131
  • 206