-1

I was wondering on how to JOIN on something that isn't an equal sign. For example, I have a few tables, all with IDs, and I can easily do the following (for equals):

LEFT JOIN ON ID1 = ID2

The above example works perfect when columns have an exact match.

But some columns, instead of having a single ID, have multiple IDs, and weird separator, for example:

Table A

ID
ID7523
ID8891
ID7463
ID5234
ID7562

As you can see, Table A has individual IDs only - works great for exact join matches (=). There are no "splits" in table A, all exact matches.

TableB

ID
ID5234 -- ID7562
ID7523
ID8891
ID7463
ID5234 -- ID7562
ID7562 -- ID5234

There's a space and two dashes and another space between some of these IDs, called 'splits', and to make matters worse, sometimes they list one ID first, sometimes they list it last (not sure if that matters yet).

I do not have the ability to edit any of the tables.

Is there any way to join the ones with the dashes also?

Thanks!

LEFT JOIN ID1 -- ID2

Received error: An expression of non-boolean type specified in a context where a condition is expected

At this point, I'm not worried about all of the logic, but just connecting the tables together.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
  • 5
    There is nothing built in to help you with this, and the reason why is because most database professional would consider the schema for the `TableB` column defintion to be **wrong** and **broken**: something that needs to be _fixed_. It's pretty much NEVER a good idea to put delimited values in a column – Joel Coehoorn Feb 17 '23 at 15:12
  • 2
    `[cont]` Which isn't to say what you want to do is impossible. It's just the code required will be more complex than you want, and will be database dependant (which is why these are just comments: we don't have enough info yet to provide an actual answer). Also, the performance will be **greatly** reduced, both because of the extra work to parse the column for _every row_ (even those you don't need) and because you've broken any chance of using an index with that column. – Joel Coehoorn Feb 17 '23 at 15:15
  • Hi Joel, thank you for the reply! Just curious, is it the spaces or the dashes (or both) that are bad? – Greg-learner Feb 17 '23 at 15:21
  • 1
    It's the fact you're stuffing multiple values into the same column at all. That should not happen, and what kind of delimiter you use to do the bad thing is irrelevant. Usually, it's a sign you need an additional table. – Joel Coehoorn Feb 17 '23 at 15:21
  • Alright! I do have the ability to create a new table, which I will do, and mess around with LIKE operators, or possible CONCAT to join them, and see how this fares. Thank you for the input! – Greg-learner Feb 17 '23 at 15:26
  • 1
    There's not enough info yet on what kind of structure you're modelling, but the goal is _probably_ to represent the TableB data so each ID gets it own row. – Joel Coehoorn Feb 17 '23 at 15:29
  • 1
    If you can create a new table you should fix the data, so that means splitting the multiple values in a single column into multiple rows; read up on how to normalise data. – Stu Feb 17 '23 at 16:35

2 Answers2

0

Fuzzymatch thinking... Is it possible to think outside the box and use something like a JOIN where the ON has a CONTAINS or LIKE statement?

UPDATE... it is possible. Ref: Using JOIN Statement with CONTAINS function

AAJ
  • 11
  • 5
0

That is 100% possible, although, inefficient from a querying perspective.

Firstly, a JOIN implies equality so using something like a JOIN ON ColumnA LIKE ColumnB is not going to be permissible (at least not with ANSI SQL - there may be some proprietary commands I'm not aware of). What you can do however create a brand new set for Table2 including a user-defined Column B within memory and use this new altered foreign key to JOIN your tables.

So for instance instead of:

SELECT TABLE1.*, TABLE2.* 
FROM TABLE1 
JOIN TABLE2 ON ID1 = ID2

Do something like:

SELECT TABLE1.*, TABLE2_MODIFIED.* 
FROM TABLE1
JOIN (SELECT TABLE2.*, LEFT(ID2, 6) AS new_id FROM TABLE2) TABLE2_MODIFIED ON ID1 = new_id

So what this does is create a temporary in-memory subset of TABLE2 (called a derived table) with a user-defined field that trims everything to the right of the first 6 characters of the ID2 field. At that point you have two keys that are ready for a typical JOIN.

If the RDBMS type you are using doesn't have a LEFT function, see if SUBSTRING, TRIM or even a CASE function will work for you. But, ultimately, if you need to join two sets and your foreign keys aren't equal, you want to redefine one of your sets to make them equal as needed.

E_net4
  • 27,810
  • 13
  • 101
  • 139
  • 1
    This sounds plain wrong. "A JOIN implies equality"??? No, not at all. You can join on any condition in standard SQL and every RDBMS I am aware of. – Thorsten Kettner Feb 17 '23 at 20:12
  • @ThorstenKettner Now that is an interesting topic but not wrong. If you want to get technical and talk mathematics, you are correct, an `INNER JOIN` is technically not the same thing as an `EQUIJOIN` and actually includes greater/lesser than operators that define an `OUTER` (or `LEFT`/`RIGHT`) `JOIN`. [Relational Algebra](https://en.wikipedia.org/wiki/Relational_algebra) defines a number of mathematical JOINs not implemented or implemented differently in a variety of RDBMS flavors. Pragmatically speaking, however, when working with a database implementation... –  Feb 17 '23 at 21:20
  • You want to treat an `INNER JOIN` as you would an `EQUIJOIN` and rely on your `OUTER JOIN`s for greater than/less than relationships. Implementing anything else is going to confuse the hell out of the next developer who has to maintain your code. –  Feb 17 '23 at 21:22
  • No, you are confusing things. You are correct that equijoins are the most common. But you cannot turn an inner join into an outer join by replacing the comparison operator from equal to lesser or greater. The comparison operator has nothing to do with inner vs. outer join. Non-equi-joins have been quite popuplar with self-joins and they are also needed when a table contains a range (e.g. valid from / until dates). Nowadays, with lateral joins and window functions, we see non-equi-joins less frequently. – Thorsten Kettner Feb 18 '23 at 09:15