Can I make a join condition that joins if the join key has a particular field?

Question

I have a table that has a column with strings formatted like this: {1,4,5}. They can be any length and I'd like to join an ID table against any value that has its ID in that string.

This is the first table

name     id         count 
apple    {1,3,6}    5
orange   {5,3,1}    3
potato   {8,1,9}    3

This is the second table -

id2     category
1      foo
2      foobar
3      candy
4      candybar
5      oreo
6      pistachio

I'd like a row for every ID listed in the first table that has the category from the second table. I'd like them to look like this -

id2 name     id         count 
1 apple    {1,3,6}    5
1 orange   {5,3,1}    3
1 potato   {8,1,9}    3
3 apple    {1,3,6}    5
3 orange   {5,3,1}    3
8 potato   {8,1,9}    3
9 potato   {8,1,9}    3

This is what I've got so far. Can I have a join filter that says join if the value is included?

select id2, name, id, count
from table2 as t2 
left join table1 as t1 
on t2.id2 %in% t1.id

String functions are very different from database to database. What specific database are you using? PostgreSQL, Oracle, DB2, etc. — The Impaler, Mar 19 '19 at 22:47
For what it's worth, the id field is created from an `array_agg()` function, but I figured I should cast it as text. — tadon11Aaa, Mar 19 '19 at 23:03

Maximilian C. · Accepted Answer · 2019-03-19T23:28:23.943

5

1) Unsolicited advice

I think it's worth considering if you database design (i.e. the way you cut your tables) is really beneficial to your cause. The way the tables are currently set up, is violating Codd's 1st Normal Form of database design. Consider changing your design to express an n:m relationship between the objects in FirstTable and SecondTable
Have names valid in the context of the table. Instead of having id2 in one table and id in another, just name both id. In your queries you can refer to them as firsttable.id and secondtable.id to distinguish them.

2) Actual answer

Yes, it is possible but (as also pointed out by the commentors) depends on the database system you use.

If firststable.id is an array in PostgreSQL, the following query should work:

SELECT
    *
FROM
    first
JOIN
    second
ON
    second.id = ANY(first.ids);
    -- Took the liberty to change the column names

This SQLFiddle provides a working example.

If firsttable.id is a string then you can cast the string to an array using '{42, 23, 17}'::int[] as described here:

SELECT
    *
FROM
    first
JOIN
    second
ON
    second.id = ANY(first.ids::int[]);

This SQLFiddle gives a working example in case it's a string.

edited Mar 19 '19 at 23:28

answered Mar 19 '19 at 23:04

Maximilian C.

967
5
22

1

I would agree with you if the values like `{1,2,3}` were arrays of ints. However, it seems they are `VARCHAR`. – The Impaler Mar 19 '19 at 23:18
Wouldn't the second SQLFiddle then handle this properly? – Maximilian C. Mar 19 '19 at 23:20
I'll adapt answer to make that more explicit. – Maximilian C. Mar 19 '19 at 23:20
I think that would work. +1 for the effort. I would encourage you to use modern join syntax. – The Impaler Mar 19 '19 at 23:22
Thx! I was unsure if cross-join or left join would be more appropriate. What is the reason for your suggestion? Improved readibility? :) – Maximilian C. Mar 19 '19 at 23:24
Modern join syntax separates join predicates (in the `JOIN` clause) from filtering predicates (in the `WHERE` clause). That greatly improves readability, debuging, and also makes it a lot easier to tackle query performance. – The Impaler Mar 19 '19 at 23:26
Not sure about the whole VARCHAR thing, but this did the trick for me. – tadon11Aaa Mar 20 '19 at 03:07

score 0 · Answer 2 · answered Mar 19 '19 at 23:12

I didn't see the PostgreSQL when I first started to solve this.

You can try the following, but no guarantees if Postgre does not have all the functions.

SELECT * FROM (
     SELECT 
         Split.a.value('.', 'VARCHAR(100)') AS ID2  
         ,A.Name, A.ID, A.[Count]
     FROM  
     (
         SELECT Name, [Count], ID,  
             CAST ('<M>' + REPLACE(REPLACE(REPLACE(ID,'{',''),'}',''), ',', '</M><M>') + '</M>' AS XML) AS Data  
         FROM [StackOver].[dbo].[SplitKey]
     ) AS A CROSS APPLY Data.nodes ('/M') AS Split(a)
 ) as B  
 Left Join [StackOver].[dbo].[SplitKeyID2] as C
 On B.ID2 = C.ID2
  Where C.Category > ''
 Order By B.ID2, B.name

The Impaler · Answer 3 · 2019-03-19T23:20:18.750

I'm pretty convinced there's a better solution that doesn't involve the GROUP BY and ARRAY_AGG(), but since you are already there, I think this query may help you:

select
  t2.id2,
  t2.category,
  t1.id,
  t1.count
from table1 t1
join table2 t2 on (
     position ('{' || t2.id2 || '}' in t1.id) <> 0
  or position ('{' || t2.id2 || ',' in t1.id) <> 0
  or position (',' || t2.id2 || ',' in t1.id) <> 0
  or position (',' || t2.id2 || '}' in t1.id) <> 0
)

Can I make a join condition that joins if the join key has a particular field?

3 Answers3