-5

I am trying to write a regexp that will find strings that contain a combination of certain words but only those words and nothing else.

To give an example, let's say I want to find strings that contain any combination of red/green/blue separated by a comma, but I do not want to find anything that contains other colours.

VALID EXAMPLES

"red,green,blue"

"red"

"green,red"

INVALID EXAMPLES

"red,green,yellow"

"blue,pink"

I have gotten around this for now by using regexp_contains(string, 'red|green|blue') and then not regexp contains then a list of other colours, however this only works with a finite list of possibilities (ie, when I know that the only values that could possibly be in the comma separated list are a specific subset of colours). Is there a way to say find me a string that is exactly any combination of these words and nothing else?

I am doing this in bigquery but can easily do this using Python or something else if bigquery regex does not support what is required.

Inb4, I am fully aware I can just deaggregate the data and filter, this is honestly more just about curiosity in terms of regex.

aeb
  • 21
  • 6
  • you really should read https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad and normalize your datastructure – nbk Mar 02 '23 at 12:21
  • while I appreciate just delimiting and normalising the data would be fine, this was more an academic question about regex. I'm just curious if it is possible. – aeb Mar 02 '23 at 12:23
  • your question is still not clear whar should wrk when you don't know the colours you exclude? – nbk Mar 02 '23 at 12:33
  • I'm sorry but im not sure what is not clear? I want a regexp that contains only a combination of the words I specify and nothing else. The response below is the correct answer. – aeb Mar 02 '23 at 12:43
  • *(ie, when I know that it has to be other colours)* when you don't knwo how do oyu want to write a query that "finds" any row – nbk Mar 02 '23 at 12:48
  • So if I know the possible combinations of words, my point is I can just say give me a string that contains these words and not these other words. My question is, what happens if you want to exclude every single other word/character except the ones you do want. But as noted, the response below gave the answer and I have posted the solution. Thanks for your help :) – aeb Mar 02 '23 at 12:57

2 Answers2

0

If I understood it right, you want to have it match for e.g.: 'red,green,blue', 'red,green', 'blue,red', but not for e.g.: 'blue,red,yellow'. If so, then this is the JavaScript code for it, which you can test directly in the browser's developer console:

/^((red|green|blue),?)+$/i.test("put_your_string_here")

Here are some test cases:

/^((red|green|blue),?)+$/i.test("red,green,blue,yellow")
false
/^((red|green|blue),?)+$/i.test("red,green,blue")
true
/^((red|green|blue),?)+$/i.test("red,green")
true
/^((red|green|blue),?)+$/i.test("blue,red")
true
/^((red|green|blue),?)+$/i.test("blue,red,yellow")
false
/^((red|green|blue),?)+$/i.test("yellow")
false
/^((red|green|blue),?)+$/i.test("yellow,black")
false

The ^ stands for beginning of string, the $ stands for the end of string; the comma is optional after the word. If you may run into situations where input like 'redgreen' is wrongly flagged as matched, then the expression which also covers this case becomes:

/^(red|green|blue)(,(red|green|blue))*$/i.test("put_your_string_here")
  • Hey, thanks a lot, ye this looks like what I was aiming for. And you have understood the ask correctly yes. Thank you very much. – aeb Mar 02 '23 at 12:31
0

Thanks to cristi.

This is indeed the solution.

^(red|green|blue)(,(red|green|blue))*$

aeb
  • 21
  • 6