1

I have a set of queries that I am trying to run but I am having issues getting them to run together.

My set up is as follows with column names in parantheses:

  • Table 1 (Email / Date)
  • Table 2 (Email / Date_Submitted)

I have written 3 queries which each work perfectly, independent of each other, but I cannot seem to figure out how to connect them.

Query 1 - Distinct Emails from Table 1 (rfi_log)

SELECT DISTINCT email, date_submitted 
FROM rfi_log 
WHERE date_submitted BETWEEN '[start_date]' AND '[end_date]'

Query 2 - Distinct Emails from Table 2 (masterstudies)

SELECT DISTINCT email 
FROM orutrimdb.mastersstudies 
WHERE date BETWEEN '[start_date]' AND '[end_date]'

Query 3 - Join Query looking for duplicate emails from Table 1 & Table 2

SELECT rfi_log.email as emails, orutrimdb.mastersstudies.email
FROM rfi_log
    CROSS JOIN orutrimdb.mastersstudies
    ON orutrimdb.mastersstudies.email=rfi_log.email 
    WHERE date_submitted BETWEEN '[start_date]' AND '[end_date]';

My issue now is that I need to combine these queries by some fashion so that I can get a count of DISTINCT emails from both tables during the date range while EXCLUDING the emails identified from Query 3.

I need the following:

  • Query 3 = Count of Distinct Emails
  • Query 2 = Count of Distinct Emails (not identified in Query 3)
  • Query 1 = Count of Distinct Emails (not identified in Query 3)

Ultimately I need to get a total count of distinct emails during the date range that is "de-duplicated" since there are duplicates located in both tables.

How can this be accomplished?

agold
  • 6,140
  • 9
  • 38
  • 54
  • I'm not sure you can get all three into a single query, but you can certainly exclude the ones from query 3 by using a [NOT IN](http://stackoverflow.com/a/1001197/870729) clause. **As a clarification** you mention you're getting the "counts", but in reality you're getting ALL the emails. If you just want counts, why not `SELECT COUNT(distinct email) FROM ...` – random_user_name Jan 23 '16 at 23:04
  • LASTLY - be sure you actually NEED / WANT to use `CROSS JOIN` - it returns bigger results than needed usually, and could likely be replaced with `INNER JOIN` – random_user_name Jan 23 '16 at 23:07
  • You could do a [`UNION`](http://dev.mysql.com/doc/refman/5.7/en/union.html) query - `SELECT rfi_log.email as emails, orutrimdb.mastersstudies.email FROM rfi_log CROSS JOIN orutrimdb.mastersstudies ... UNION SELECT DISTINCT NULL, email FROM orutrimdb.mastersstudies ... UNION SELECT DISTINCT email, NULL FROM rfi_log ...` – Sean Jan 23 '16 at 23:10

1 Answers1

0

One method for doing this is union all with aggregation. The following gets duplication information about each email:

select email, sum(isrfi) as numrfi, sum(isms) as numms
from ((select email, 1 as isrfi, 0 as isms
       from rfilog
      ) union all
      (select email, 0, 1
       from orutrimdb.mastersstudies
      )
     ) e
group by email;

An aggregation on top gives you the information you are looking for:

select numrfi, numms, count(*), min(email), max(email)
from (select email, sum(isrfi) as numrfi, sum(isms) as numms
      from ((select email, 1 as isrfi, 0 as isms
             from rfilog
            ) union all
            (select email, 0, 1
             from orutrimdb.mastersstudies
            )
           ) e
      group by email
     ) e
group by numrfi, numms;

Note that this also finds duplicates within a single table.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786