Distinct with long columns

Question

I have here some database schema with tables having long fields (in MS-SQL-Server of type "text", in Sybase of type "text" too) and I need to retrieve distinct rows.

The tables looks like

create table node (id int primary key, … a few more fields … data text);
create table ref (id int primary key, node_id int, … a few more fields);

For one row in "node", there may be zero or more rows in "ref".

Now I have a query like

SELECT node.* FROM node, ref WHERE node.id = ref.node_id AND ... some more restrictions.

This query returns duples and triples when there is more than a single row in "ref" for some "node_id".

But I need unique rows!

Using SELECT DISTINCT node.* does not work because of the columns of type "text" :-(

In Sybase there is trick, just add "GROUP BY node.id" to the query, voila! You get unique rows returned.

Is there some similar simple Trick for MS-SQL-Server?

I have already a solution with temporary tables, but this seems to be a lot slower maybe the reason is just because of the larger number of statements transferred to the database?

[Bad habits to kick : using old-style JOINs](https://sqlblog.org/2009/10/08/bad-habits-to-kick-using-old-style-joins) — SqlZim, Mar 02 '17 at 16:48
[`text` is deprecated](http://stackoverflow.com/a/564823/2333499), has been for many years now — SqlZim, Mar 02 '17 at 16:50
cast the column to (n)varchar and group on that. It might be pretty slow though... — Jeremy, Mar 02 '17 at 16:50
I need to store there data which is pretty long. Varchar with its 255 char limit is to small. — Wurgl, Mar 02 '17 at 23:50

score 0 · Accepted Answer · answered Mar 02 '17 at 17:47

It looks like you are approaching this problem from the wrong direction. Joins are typically used to expand on keys where relevant data is stored in different tables. So it's no surprise you are getting more than one row per node_id.

In your query, you join the two tables together, but then you ignore everything from ref. It looks like you're just trying to filter out ids from node that are not referenced in ref. If that is the case, then you don't want to use a join. The following will work much better

select *
    from node
    where id in (
        select node_id
            from ref
            where [any restrictions placed on the ref table go here]
    )
        and [any restrictions placed on the node table go here]

Furthermore, at the risk of teaching you bad join practices, the same thing can be accomplished they way you were trying to do it originally, but it's more painful to write and it's not good practice

select node.col1, node.col2, ... , node.last_col
    FROM node
    inner join ref on node.id = ref.node_id
    where [some restrictions.]
    group by node.col1, node.col2, ... , node.last_col

Yes. This might be the answer. My problem is a little bit more complicated. It's really, really old software written from people I never met, parts of the software are from the late 80's and it has its own query language which is translated into SQL statements. I need to investigate if I can nest such select statements deep enough or if I need a mix of nested selects with joins. The easiest way with minimal side effects would be a tricky part like this GROUP BY in Sybase. — Wurgl, Mar 02 '17 at 23:56
Inheriting old software is always rough, especially when it's written in an outdated language. Sometimes I have to code in this old language called Visual FoxPro that stopped being supported 9 years ago; imagine trying to find helpful documentation for that, haha. Anyways, I hope this query works for you. — KindaTechy, Mar 03 '17 at 16:10

Distinct with long columns

1 Answers1