0

What I'm trying to do is take my source data, which contains 29,268 records, and create six different, unique (by E-mail Address, which is a field in the data) sets of data from it. Here's my base query which grabs 4,878 records (and this query, conceptually, would execute 6 times, but what I need it to do is be able to get a new unique set of 4,878 records each time, by E-mail Address (where the Email Address in the consecutive query runs would not exist in the previous runs)). Off the top of my head, I'm thinking doing something with ranking, but I am unsure how to even proceed with doing what I need to do. I would classify myself as intermediate in SQL. This is a little over my head. Any ideas?

select top 1124 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%YAHOO.COM%'

union all

select top 402 * from
Master_Subscribers_Score_GTE_5
where ([E-mail Address] like '%HOTMAIL.COM%' or [E-mail Address] like '%LIVE.COM%')

union all

select top 45 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%AOL.COM%'

union all

select top 2353 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%GMAIL.COM%'

union all

select top 164 * from
Master_Subscribers_Score_GTE_5
where ([E-mail Address] like '%ATT.COM%' or [E-mail Address] like '%SBCGLOBAL.NET%')

union all

select top 8 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%COX.NET%'

union all

select top 3 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%VERIZON.NET%'

union all

select top 70 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] like '%RR.COM%'

union all

select top 712 * from
Master_Subscribers_Score_GTE_5
where [E-mail Address] not like '%YAHOO.COM%' and
[E-mail Address] not like '%HOTMAIL.COM%' and
[E-mail Address] not like '%LIVE.COM%' and
[E-mail Address] not like '%AOL.COM%' and
[E-mail Address] not like '%GMAIL.COM%' and
[E-mail Address] not like '%ATT.COM%' and
[E-mail Address] not like '%SBCGLOBAL.NET%' and
[E-mail Address] not like '%COX.NET%' and
[E-mail Address] not like '%VERIZON.NET%' and
[E-mail Address] not like '%RR.COM%'
Mike Marks
  • 10,017
  • 17
  • 69
  • 128
  • 2
    What is the logic behind the values appearing in the various `TOP` clauses? – Tim Biegeleisen Sep 28 '18 at 02:45
  • So, this is specifically for "IP Warming". I'm creating a list of subscribers that agree with the "best practice" method of ramping up an IP address for email sending. So, the "TOP" is basically just saying that for "Email Send 1", I only want to send to 1,124 Yahoo email addresses, and so on. Does that make sense? I'm using Salesforce Marketing Cloud as the email sending platform, but that part is irrelevant. – Mike Marks Sep 28 '18 at 02:49
  • The customer wants to send 6 out of 7 days in the week, so I took the weekly threshold numbers and divided by 6 to get my "TOP" numbers. – Mike Marks Sep 28 '18 at 02:49
  • Yes, you might be able to use `ROW_NUMBER` to page out the top N records for each of the tables. – Tim Biegeleisen Sep 28 '18 at 02:50

2 Answers2

1

First off, using LIKE has its disadvantages. Take a look at this post.

You can get the email address provider(host) by using SUBSTRING and CHARINDEX

The following will get the email provider

SUBSTRING(Email, CHARINDEX('@', Email, 1)+1, LEN(EmailR) - CHARINDEX('@', Email, 1))

Now, since you get the part that you need to filter with, use it to filter the records, and then use ROW_NUMBER() to get the number of records for each provider, which will be used again for further filtering. The, you can use CASE to finalize the records.

Here is an example :

SELECT *
FROM (
    SELECT *
    ,   CASE
            WHEN  UPPER(EmailDomain) = 'YAHOO.COM' AND RN <= 1124 
            THEN 'Group 1'
            WHEN  UPPER(EmailDomain) = 'HOTMAIL.COM' AND RN <= 402
            THEN 'Group 2'
            WHEN  UPPER(EmailDomain) = 'AOL.COM' AND RN <= 45
            THEN 'Group 3'
            WHEN  UPPER(EmailDomain) = 'GMAIL.COM' AND RN <= 2353
            THEN 'Group 4'
            WHEN  (UPPER(EmailDomain) = 'ATT.COM' OR UPPER(EmailDomain) = 'SBCGLOBAL.NET') AND RN < 164
            THEN 'Group 5'
            WHEN  UPPER(EmailDomain) = 'COX.NET'  AND RN <= 8
            THEN 'Group 6'
            WHEN  UPPER(EmailDomain) = 'VERIZON.NET' AND RN <= 3
            THEN 'Group 7'
            WHEN  UPPER(EmailDomain) = 'RR.COM' AND RN <= 70
            THEN 'Group 8'
            WHEN  UPPER(EmailDomain) NOT IN('YAHOO.COM','HOTMAIL.COM','AOL.COM','GMAIL.COM','ATT.COM','SBCGLOBAL.NET','COX.NET','VERIZON.NET','RR.COM') AND RN <= 712
            THEN 'Group 9'
            ELSE NULL
        END EmailGroup
    FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY EmailDomain ORDER BY EmailDomain) RN 
FROM (
SELECT 
    Email  
,   SUBSTRING(Email, CHARINDEX('@', Email, 1)+1, LEN(EmailR) - CHARINDEX('@', Email, 1)) EmailDomain 
FROM 
    Master_Subscribers_Score_GTE_5
) D 
) C
) E
WHERE 
    EmailGroup IS NOT NULL 

NOTICE that I've used ROW_NUMBER() as a replacement of SELECT TOP x. Then I just gave the records that do not fit under any condition a NULL, this gave me an easy way to only show what I need and fill the rest with NULL to be excluded from the results.

I've used UPPER() because I don't know your database collation - whether it is case sensitive or not. So I used it to overcome this. If your database is case insensitive, it won't be needed.

I hope this helps.

DatumPoint
  • 419
  • 4
  • 21
iSR5
  • 3,274
  • 2
  • 14
  • 13
0
with ranked as (
    select m.*, n = row_number() over (partition by b.bucket order by m.[E-mail Address])
    from Master_Subscribers_Score_GTE_5 m
    outer apply (select bucket from (values
        ('yahoo.com'), ('hotmail.com,live.com'),
        ('aol.com'), ('gmail.com'), ('att.com,sbcglobal.net'),
        ('cox.net'), ('verizon.net'), ('rr.com'))
        _(bucket) where exists (
            select * from string_split(bucket, ',')
            where m.[E-mail Address] like '%' + value + '%')) b)

select * from ranked where n % 6 = 0

..should give you 1124 for yahoo.com, 402 for hotmail.com and live.com, etc. then query where n % 6 = 1 for the next set n % 6 = 2 and so on.

gordy
  • 9,360
  • 1
  • 31
  • 43