1

I am trying to query the stackoverflow data in BigQuery. I would like to use the REGEXP_CONTAINS function to pull all questions with the r tag. The tags column is a pipe delimited field that contains all of the tags associated with each question. A sample of what that looks like is below:

enter image description here

I can't seem get a regex that works.

I am almost there with \|?(r)\|?. This will work for:

r
sql|r|python
c
vba|r
r|java

But it will also return things like:

ruby
angular|c|vba

...which I don't want.

I need to somehow have a regex that will pull r all alone unless it has a pipe on either side.

Nick Criswell
  • 1,733
  • 2
  • 16
  • 32

1 Answers1

2

You can use

(?:^|\|)(r)(?:\||$)

See the regex demo

Details

  • (?:^|\|) - a non-capturing group matching start of string or | char
  • (r) - Group 1: r
  • (?:\||$) - a non-capturing group matching | char or end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Thank you, this works and your explanation is very helpful. Thinking about this from a start of string OR pipe / pipe OR end of string is the cognitive jump which I couldn't make. Thanks! – Nick Criswell Sep 10 '20 at 19:01