3

I attempting to construct a JavaScript friendly regular expression that matches any strings that begin with a certain group of words (A), and if other words are included in the string they must either be within a group of words (B) or not within a group of words (C).

So given the following word groups (A), (B) and (C):

(A) Test, Sample
(B) Good, Stuff
(C) Hello, World

and given the following example strings that begin with any words in (A):

Test
Test Good
Sample Stuff 
Test Hello 
Sample World 
Test Hello Stuff 
Sample Good World
Test Other
Test Other Stuff 
Sample Other World
Test Other Stuff Other

The following strings would be matched:

Test
Test Good
Sample Stuff
Test Other Stuff 
Test Other Stuff Other

Ideally only the words in group A ("Test" and "Sample" in this case) would be consumed by the expression, and the rest would be handled by positive and negative lookaheads. However I can also work with all or part of a string that begins with (A) may contain (B) but does not contain (C).

I have been working on this problem for several days now, and the closest answer I have found on this website so far is:

Is there a regex to match a string that contains A but does not contain B

However the solution that is suggested there does not include the requirement for starting words to be matched singularly (as is the case in my example with the first match "Test").

The closest I have come to a solution is the following expression:

^(Test|Sample).*(?=(Good|Stuff))(?!.*(Hello|World)).*

See here for a working example:

https://regex101.com/r/nL0iE3/1

However this does not match single instances of words in (A) (e.g. "Test") and matches words in (C) when they occur before words in (B) (e.g. "Sample World Good").

I hope that makes sense, but please let me know if I can clarify anything further. I would be very grateful for any help or pointers in the right direction.

Community
  • 1
  • 1

3 Answers3

2

I hope I understood correctly, but I think you're looking for

^(Test|Sample)(?!.*(Hello|World))(?=$|.*(Stuff|Other)).*

Test it live on regex101.com.

Explanation:

^                     # Start of string
(Test|Sample)         # Match Test or Sample
(?!.*(Hello|World))   # Assert that neither Hello nor World are in the string
(?=$|.*(Stuff|Other)) # Assert that the string is over here or that Stuff/Other follows
.*                    # Match rest of string
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • This looks pretty good but it will match something like "Test Word Stuff". The requirements look a little confusing though. C sounds redundant with B. – NullUserException Aug 18 '16 at 16:44
  • Ahh, I thought other words were allowed (because as you wrote, why would we need a list of forbidden words then?), but perhaps they aren't. That would make the regex a lot easier. – Tim Pietzcker Aug 18 '16 at 16:46
1

Following on from Geo's great answer, I have today managed to slightly refactor his provided expression from:

(?=(^(?!.*(Hello|World)).*))(^(Test|Sample)$|^(Test|Sample).*(?=(Good|Stuff)).*$)

To:

(?=(^(?!.*(Hello|World)).*))^(Test|Sample)($|.*(?=(Good|Stuff)).*$)

See a working version here.

This version removes the need for the two occurrences of string starting words (group A words) in the expression. Otherwise the expression operates in the same way that Geo has explained in his answer.

Hopefully this will be of help to someone else.

Community
  • 1
  • 1
0

try

(?=(^(?!.*(Hello|World)).*))(^(Test|Sample)$|^(Test|Sample).*(?=(Good|Stuff)).*$)

see it here working: https://regex101.com/r/qX2xS6/2

A Quick explanation:

first exclude all Hello|World
then _ with the matching strings (so far) _ do the rest matches.

rest matches:
match lines with only one word: Test|Sample
-- or --
match lines beginning with Test|Sample and containing Good|Stuff

Geo Halkiadakis
  • 370
  • 1
  • 7
  • Thanks you so much, that is exactly what I was after... I didnt realise that negative lookaheads can be added at the beginning like that, and I hadnt considered splitting the expression into two sides of an OR statement like that. Also locking down Test and Sample between the ^ and $ is also a great way of forcing the single occurrences of those words to match. Also thank you to everyone else for your comments. I hope this is useful to someone else at some point! – Derek Σωκράτης Finch Aug 18 '16 at 18:25
  • 1
    Thank you too Σωκράτη, να είσαι καλά :) – Geo Halkiadakis Aug 18 '16 at 18:51
  • 1
    Και για σένα φίλε μου ;-) – Derek Σωκράτης Finch Aug 18 '16 at 19:13