Need to UPPER SQL statement with INNER JOIN SELECT

Question

I'm using Pervasive SQL 10.3 (let's just call it MS SQL since almost everything is the same regarding syntax) and I have a query to find duplicate customers using their email address as the duplicate key:

SELECT arcus.idcust, arcus.email2
     FROM arcus 
          INNER JOIN (
               SELECT arcus.email2, COUNT(*) 
                    FROM arcus WHERE RTRIM(arcus.email2) != '' 
                         GROUP BY arcus.email2 HAVING COUNT(*)>1
          ) dt 
     ON arcus.email2=dt.email2
          ORDER BY arcus.email2";

My problem is that I need to do a case insensitive search on the email2 field. I'm required to have UPPER() for the conversion of those fields.

I'm a little stuck on how to do an UPPER() in this query. I've tried all sorts of combinations including one that I thought for sure would work:

 ... ON UPPER(arcus.email2)=UPPER(dt.email2) ...

... but that didn't work. It took it as a valid query, but it ran for so long I eventually gave up and stopped it.

Any idea of how to do the UPPER conversion on the email2 field?

Thanks!

score 3 · Accepted Answer · answered Aug 05 '13 at 22:11

If your database is set up to be case sensitive, then your inner query will have to take account of this to perform the grouping as you intended. If it is not case sensitive, then you won't require UPPER functions.

Assuming your database IS case sensitive, you could try the query below. Maybe this will run faster...

SELECT arcus.idcust, arcus.email2
 FROM arcus 
      INNER JOIN (
           SELECT UPPER(arcus.email2) as upperEmail2, COUNT(*) 
                FROM arcus WHERE RTRIM(arcus.email2) != '' 
                     GROUP BY UPPER(arcus.email2) HAVING COUNT(*)>1
      ) dt 
 ON UPPER(arcus.email2) = dt.upperEmail2

This did the trick and in reasonable time! For some reason I thought I had to do an UPPER() on the first SELECT on arcus.email2 but clearly that wasn't the case. Kudos! — Donavon Yelton, Aug 06 '13 at 02:20

score 0 · Answer 2 · edited May 23 '17 at 12:33

The collation of a character string will determine how SQL Server compares character strings. If you store your data using a case-insensitive format then when comparing the character string “AAAA” and “aaaa” they will be equal. You can place a collate Latin1_General_CI_AS for your email column in the where clause.

Check the link below for how to implement collation in a sql query.

How to do a case sensitive search in WHERE clause

score 0 · Answer 3 · answered Aug 05 '13 at 21:39

Check out this blog post which discusses case insensitive searches in SQL. In essence, the reason why it was so slow was that most likely none of the current table indexes could be used in the query, so the database engine had to perform a full table scan, likely multiple times.

An index on arcus.email2 is completely useless when wanting to compare between the uppercased versions (UPPER(arcus.email2)), because the database engine cannot look up the values in the index (because they're different values!).

To improve the performance, you can create an index specifically on the result of applying UPPER to the field.

CREATE INDEX IX_arcus_UPPER_email2
    ON arcus (UPPER(email2));

Need to UPPER SQL statement with INNER JOIN SELECT

3 Answers3