Matching patterns between multiple columns

Question

I have two columns say Main and Sub. (they can be of same table or not).

Main is varchar of length 20 and Sub is varchar of length 8.
Sub is always subset of Main and it is last 8 characters of Main.

I could successfully design a query to match pattern using substr("Main",13,8)

Query:

select * from "MainTable"
 where substr("MainColumn",13,8) LIKE (
   select "SubColumn" From "SubTable" Where "SubId"=1043);

but I want to use Like, % , _ etc in my query so that I can loosely match the pattern (that is not all 8 characters).

Question is how can i do that.?!

I know that the query below is COMPLETELY WRONG but I want to achieve something like this,

Select * from "MainTable"
 Where "MainColumn" Like '%' Select "SubColumn" From "SubTable" Where "SubId"=2'

score 7 · Accepted Answer · edited May 23 '17 at 12:17

The answers so far fail to address your question:

but I want use Like, % , _ etc in my query so that I can loosely match the pattern (that is not all 8 characters).

It makes hardly any difference whether you use LIKE or = as long as you match the whole string (and there are no wildcard character in your string). To make the search fuzzy, you need to replace part of the pattern, not just add to it.

For instance, to match on the last 7 (instead of 8) characters of subcolumn:

SELECT *
FROM   maintable m
WHERE  left(maincolumn, 8) LIKE 
       ( '%' || left((SELECT subcolumn FROM subtable WHERE subid = 2), 7));

I use the simpler left() (introduced with Postgres 9.1).
You could simplify this to:

SELECT *
FROM   maintable m
WHERE  left(maincolumn, 7) =
       (SELECT left(subcolumn,7) FROM subtable WHERE subid = 2);

But you wouldn't if you use the special index I mention further down, because expressions in functional indexes have to matched precisely to be of use.

You may be interested in the extension pg_tgrm.

In PostgreSQL 9.1 run once per database:

CREATE EXTENSION pg_tgrm;

Two reasons:

It supplies the similarity operator %. With it you can build a smart similarity search:

--SELECT show_limit();
SELECT set_limit(0.5); -- adjust similarity limit for % operator

SELECT *
FROM maintable m
WHERE left(maincolumn, 8) %
      (SELECT subcolumn FROM subtable WHERE subid = 2);

It supplies index support for both LIKE and %

If read performance is more important than write performance, I suggest you create a functional GIN or GiST index like this:
```
CREATE INDEX maintable_maincol_tgrm_idx ON maintable
USING gist (left(maincolumn, 8) gist_trgm_ops);
```
This index supports either query. Be aware that it comes with some cost for write operations.
A quick benchmark for a similar case in this related answer.

If you use a GiST index instead of GIN, a similarity search ordered by similarity `DESC` and with a `LIMIT` will use KNN searching of the index -- the index will actually return rows in order of "best match". This can be much faster than alternatives. The value used in `set_limit()` can have a major impact on the performance of KNN searches, so play with that to make sure you're not including matches which are too "loose" to really be of any interest. (KNN is a notation meaning that you want "k" nearest neighbors, where "k" is a number you provide.) — kgrittn, Jul 01 '12 at 20:03
@indyaah: I added a link to a related answer that illustrates the performance of GiST and GIN indexes - just like Kevin describes it. Also amended the example here to display a GiST index, which is the better pick. — Erwin Brandstetter, Jul 01 '12 at 22:44

score 2 · Answer 2 · edited Jul 01 '12 at 10:42

2

Try

SELECT t1.* from "Main Table" AS t1, "SubTable" AS t2
 WHERE t2.SubId=1043
   AND substr(t1.MainColumn, 13, 8) LIKE "%" || CAST(t2.SubColumn as text);

edited Jul 01 '12 at 10:42

vyegorov

21,787
7
59
73

answered Jul 01 '12 at 10:32

bksi

1,606
1
23
45

actually i wanted to avoid using substr(). – Anuj Patel Jul 01 '12 at 14:26

score 2 · Answer 3 · edited May 23 '17 at 11:46

Argument to a LIKE is an ordinary string, so all string manipulations are valid here. In your case you need to concatenate wildchars with the target substring, like @bksi suggests:

... LIKE '%'||CAST("SubColumn" AS test) ...

Note, though, that such patterns (the ones starting with a % wildcard) are badly performing ones. Take a look at PostgreSQL LIKE query performance variations.

I would recommend:

sticking with the current substr("MainColumn", 13, 8) approach;
avoid LIKE and use equality comparison (=) instead (although they're equal if LIKE pattern contains no wildcards, it is easier to read the query);

build an expression index on the "MainTable" the following way:

CREATE INDEX i_maincolumn ON "MainTable" (substr("MainColumn", 13, 8));

This combination will perform better in my view.

And use lowercase names for the tables/columns, so that you can avoid doublequoting them.

Matching patterns between multiple columns

3 Answers3

Linked