0

I have the following table

create table bufScan
(
   msgid int,
   conversationid nvarchar(max),
   mailbox nvarchar(max)
);

INSERT dbo.bufScan VALUES 
(1,'person1@company.com','AAQkAGIwZjk4OTk4LTRkZGYtNDM5Yi04NGZlLTAzMDY1MjQ3ZjVlMgAQAPX8hFCq30h3robsMxenwt8='),
(2,'person1@company.com','AAQkAGIwZjk4OTk4LTRkZGYtNDM5Yi04NGZlLTAzMDY1MjQ3ZjVlMgAQAESPK731aUirpd0CyOIlR5I='),
(3,'person1@company.com','AAQkAGIwZjk4OTk4LTRkZGYtNDM5Yi04NGZlLTAzMDY1MjQ3ZjVlMgAQAESPK731aUirpd0CyOIlR5I='),
(4,'person1@company.com','AAQkAGIwZjk4OTk4LTRkZGYtNDM5Yi04NGZlLTAzMDY1MjQ3ZjVlMgAQAPX8hFCq30h3robsMxenwt8='),
(5,'person2@company.com','AAQkADhlYTk5MGY1LTJkOTUtNDVjNy1iNDg0LTljYjc5ODAzZTM3OQAQAPX8hFCq30h3robsMxenwt8='),
(6,'person2@company.com','AAQkADhlYTk5MGY1LTJkOTUtNDVjNy1iNDg0LTljYjc5ODAzZTM3OQAQAESPK731aUirpd0CyOIlR5I='),
(7,'person2@company.com','AAQkADhlYTk5MGY1LTJkOTUtNDVjNy1iNDg0LTljYjc5ODAzZTM3OQAQAESPK731aUirpd0CyOIlR5I='),
(8,'person2@company.com','AAQkADhlYTk5MGY1LTJkOTUtNDVjNy1iNDg0LTljYjc5ODAzZTM3OQAQAX8hFCq30h3robsMxenwt8=');

Let us assume that AAQkADhlYTk5MGY1LTJkOTUtNDVjNy1iNDg0LTljYjc5ODAzZTM3OQ and AAQkAGIwZjk4OTk4LTRkZGYtNDM5Yi04NGZlLTAzMDY1MjQ3ZjVlMg are entries that give us information for a specific mailbox and AX8hFCq30h3robsMxenwt8= and AESPK731aUirpd0CyOIlR5I= for a specific conversation.

In this example, person1 is having conversations with person2. I would like to keep all conversation from only one mailbox, otherwise, it will be like having double conversations in my data. That means that i would like to keep entries for AX8hFCq30h3robsMxenwt8= and AESPK731aUirpd0CyOIlR5I= either from AAQkADhlYTk5MGY1LTJkOTUtNDVjNy1iNDg0LTljYjc5ODAzZTM3OQ or AAQkAGIwZjk4OTk4LTRkZGYtNDM5Yi04NGZlLTAzMDY1MjQ3ZjVlMg. I hope I made it more specific. Since I want to keep also information about all messages that are part of the conversation I don't want to delete duplicates that have different message id in the same partition of the mailbox.

Cœur
  • 37,241
  • 25
  • 195
  • 267

2 Answers2

1

You can use dense_rank() function :

select top (1) with ties *
from table t
order by dense_rank() over (partition by substring(mailbox, 4, len(mailbox)) 
                                          order by conversationid);
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
  • You can't use DENSE_RANK() in an ORDER BY clause and there are other syntax problems with your query. Also - This not account for the fact that the attributes can be different lengths – Alan Burstein Jun 04 '18 at 16:39
  • 1
    @AlanBurstein. . . How can u think we can't use Analytical function in `order by` clause ? – Yogesh Sharma Jun 04 '18 at 16:45
  • I stand corrected - that just blows my mind. I've been using Window Functions in SQL Server since SQL 2005 and have NEVER seen them used in an order by clause in any blog, forum or book. Even BOL has no such examples. Downvote removed, I learnt something very cool today. Bravo sir! – Alan Burstein Jun 04 '18 at 17:22
  • Thank you @Yogesh Sharma, it was really helpful! – homeostasis Jun 05 '18 at 08:03
  • @Yogesh Sharma, Please take a look at my edited question – homeostasis Jun 05 '18 at 09:53
0

You're explanation is still a bit vague. I think this will work:

Using this sample data:

USE tempdb
GO

create table dbo.bufScan
(
 msgid int,
 conversationid nvarchar(max),
 mailbox nvarchar(max)
 );

INSERT dbo.bufScan VALUES 
(1,'mailbox1','xxxAQA111'),
(2,'mailbox1','xxxAQA111'),
(3,'mailbox1','xxxAQA222'),
(4,'mailbox1','xxxAQA222'),
(5,'mailbox2','yyyAQA111'),
(6,'mailbox2','yyyAQA111'),
(7,'mailbox2','yyyAQA222'),
(8,'mailbox2','yyyAQA222');

You could do this:

WITH uniquify AS
(
  SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY f.conv ORDER BY (SELECT NULL))
  FROM dbo.bufScan
  CROSS APPLY 
    (VALUES (SUBSTRING(mailbox, PATINDEX('%[0-9]%', mailbox), LEN(mailbox)))) f(conv)
)
SELECT msgid, conversationid, mailbox
FROM uniquify
WHERE rn <= 2;

Which returns:

msgid       conversationid   mailbox
----------- ---------------- -----------
1           mailbox1         xxxAQA111
2           mailbox1         xxxAQA111
7           mailbox2         yyyAQA222
8           mailbox2         yyyAQA222

You can change rn <= 2 to rn = 1 to return this:

msgid       conversationid   mailbox
----------- ---------------- -----------
1           mailbox1         xxxAQA111
7           mailbox2         yyyAQA222
Alan Burstein
  • 7,770
  • 1
  • 15
  • 18