1

I have a Redshift table column that contains 1 to many hashtags (e.g. #a, #b, etc.). I want to write a query that finds rows where all tags from a given set exist (e.g. #a and #b) while not picking up other rows that have some but not all of the tags (e.g. only #a or only #b).

I can see how to do this with multiple LIKE statements (e.g. LIKE '%#a %' AND LIKE '%#b%') but I would really like to do it with a single statement. I can see how to do this with SIMILAR TO but not in a way that ignores ordering. The following would work but only if I include all possible combinations of ordering.

SELECT * FROM table WHERE field SIMILAR TO '(%#a%)(%#b%)|(%#b%)(%#a%)'

This works but having to list all combinations of the tags I'm looking for would be a royal pain and prone to error. Is there a way to express 'AND' in SIMLAR TO (or another function) in Redshift that ignores order?

  • Do note: `LIKE` may yield [better performance](https://dba.stackexchange.com/a/10696) than a regex call. And as RedShift uses a variant dialect of Postgres, heed the PGSQL guru @Erwin who advises: [Never user `SIMILAR TO`. It's pointless.](https://stackoverflow.com/a/12459689/1422451). – Parfait Aug 15 '19 at 14:07

2 Answers2

1

Make sure to capture the whole tag in any position and not match on incomplete tags:

SELECT *
FROM table
WHERE (field LIKE '#a#%' OR field LIKE '%#a') AND
      (field LIKE '#b#%' OR field LIKE '%#b')

This avoids matching data such as #ac#b

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
0

Use AND and LIKE:

SELECT t.*
FROM table t
WHERE field LIKE '%#a%' AND
      field LIKE '%#b%';
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786