34

I would really like some advice here, to give some background info I am working with inserting Message Tracking logs from Exchange 2007 into SQL. As we have millions upon millions of rows per day I am using a Bulk Insert statement to insert the data into a SQL table.

In fact I actually Bulk Insert into a temp table and then from there I MERGE the data into the live table, this is for test parsing issues as certain fields otherwise have quotes and such around the values.

This works well, with the exception of the fact that the recipient-address column is a delimited field seperated by a ; character, and it can be incredibly long sometimes as there can be many email recipients.

I would like to take this column, and split the values into multiple rows which would then be inserted into another table. Problem is anything I am trying is either taking too long or not working the way I want.

Take this example data:

message-id                                              recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5@Server.com   user1@domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E@Server.com     user2@domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1@Server.com              user3@domain3.com;user4@domain4.com;user5@domain5.com

I would like this to be formatted as followed in my Recipients table:

message-id                                              recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5@Server.com   user1@domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E@Server.com     user2@domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1@Server.com              user3@domain3.com
4fd70c47.4d600e0a.0a7b.ffff87e1@Server.com              user4@domain4.com
4fd70c47.4d600e0a.0a7b.ffff87e1@Server.com              user5@domain5.com

Does anyone have any ideas about how I can go about doing this?

I know PowerShell pretty well, so I tried in that, but a foreach loop even on 28K records took forever to process, I need something that will run as quickly/efficiently as possible.

Thanks!

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
HungryHippos
  • 1,473
  • 5
  • 16
  • 24
  • I think you should put you three result in a table using a split function Look at this : http://stackoverflow.com/questions/314824/t-sql-opposite-to-string-concatenation-how-to-split-string-into-multiple-reco And after that you can manage to join your split data on you other table to get your result – GregM Jun 13 '12 at 15:22

4 Answers4

74

If you are on SQL Server 2016+

You can use the new STRING_SPLIT function, which I've blogged about here, and Brent Ozar has blogged about here.

SELECT s.[message-id], f.value
  FROM dbo.SourceData AS s
  CROSS APPLY STRING_SPLIT(s.[recipient-address], ';') as f;

If you are still on a version prior to SQL Server 2016

Create a split function. This is just one of many examples out there:

CREATE FUNCTION dbo.SplitStrings
(
    @List       NVARCHAR(MAX),
    @Delimiter  NVARCHAR(255)
)
RETURNS TABLE
AS
    RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
        Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(@List, Number, 
        CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)))
    FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
        FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects) AS n(Number)
    WHERE Number <= CONVERT(INT, LEN(@List))
        AND SUBSTRING(@Delimiter + @List, Number, 1) = @Delimiter
    ) AS y);
GO

I've discussed a few others here, here, and a better approach than splitting in the first place here.

Now you can extrapolate simply by:

SELECT s.[message-id], f.Item
  FROM dbo.SourceData AS s
  CROSS APPLY dbo.SplitStrings(s.[recipient-address], ';') as f;

Also I suggest not putting dashes in column names. It means you always have to put them in [square brackets].

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • 3
    You sir, deserve an internet cookie :) I had to make a couple of changes, I had to call the Item field Value instead as PowerShell did not like the name Item. I also had to add 'AS f' after the CROSS APPLY to alias that section as f so calling f.item/f.value worked. – HungryHippos Jun 13 '12 at 16:29
  • Also hear you about the column names, this was done just to keep parity with the tracking log column names themselves, i'm aware of the need for brackets and it's fine ok. – HungryHippos Jun 13 '12 at 16:36
  • briliant sample. my statement looks like this: SELECT s.item, f.Item FROM dbconfig AS s CROSS APPLY SplitStrings(s.setting, ';') AS f WHERE s.item = 'EXE_PATHS' – Christian Casutt Sep 24 '13 at 09:03
  • This is exactly what I needed. Thanks! – missscripty Dec 10 '19 at 19:09
  • This codes is really interesting, it works well but I didnt understant how it works, If it possible anyone could explain – Çağlar Can Sarıkaya Aug 31 '21 at 18:06
  • @ÇağlarCanSarıkaya On modern and supported versions of SQL Server you can ignore all the function part and just replace `f.Item` with `f.value` and `dbo.SplitStrings` with `STRING_SPLIT`. – Aaron Bertrand Aug 31 '21 at 18:30
  • I am using 2012, split_string is not supporting but I want to understand this for improve my skills, it is really interesting query – Çağlar Can Sarıkaya Aug 31 '21 at 21:15
  • @ÇağlarCanSarıkaya Can you be more specific about where you're looking for guidance? The function is just one of many ways to break a string apart before SQL Server 2016... I discuss a few others [here](https://sqlperformance.com/2012/07/t-sql-queries/split-strings), [here](https://sqlperformance.com/2012/08/t-sql-queries/splitting-strings-follow-up), and a better approach than splitting in the first place [here](https://sqlperformance.com/2012/08/t-sql-queries/splitting-strings-now-with-less-t-sql). – Aaron Bertrand Aug 31 '21 at 23:10
  • Thanks for your examples, it's clearer now for the other examples but In the example here, what I don't understand is the purpose of the sys_allobjects table. "SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id]) FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects " this part returns 5.125.201 rows – Çağlar Can Sarıkaya Sep 01 '21 at 05:11
  • @ÇağlarCanSarıkaya it's just a source that we know exists and has a lot of rows - we need some source we can apply `ROW_NUMBER()` to in order to supply a sequence which is used for identifying all the locations (using `CHARINDEX/SUBSTRING()`) of the comma within the string. It's not an intuitive way to do it, and I used to bounce around between this, recursive CTEs, and building a numbers table if you didn't already have one (and a lot of people objected to that). – Aaron Bertrand Sep 01 '21 at 10:51
  • For some reason, I had an idea that you made a match from the object ids and split the lines accordingly. I couldn't understand how it happened that there was no such connection. I see now. Thank you so much – Çağlar Can Sarıkaya Sep 01 '21 at 13:16
  • @AaronBertrand if its possible could you review my question ? https://stackoverflow.com/questions/65896636/how-should-make-faster-sql-server-filtering-procedure-with-many-parameters – Çağlar Can Sarıkaya Sep 02 '21 at 10:08
  • @ÇağlarCanSarıkaya I will just say, you need to check the execution plan for each operation to determine which parts of the process are slow, and in all likelihood the execution plans will tell you where to focus. – Aaron Bertrand Sep 02 '21 at 11:10
  • Okay, Thanks again I will review execution plan with detailed – Çağlar Can Sarıkaya Sep 02 '21 at 13:21
5

SQL Server 2016 include a new table function string_split(), similar to the previous solution.

The only requirement is Set compatibility level to 130 (SQL Server 2016)

Oscar Perez
  • 111
  • 1
  • 5
5

You may use CROSS APPLY (available in SQL Server 2005 and above) and STRING_SPLIT function (available in SQL Server 2016 and above):

DECLARE @delimiter nvarchar(255) = ';';

-- create tables
CREATE TABLE MessageRecipients (MessageId int, Recipients nvarchar(max));
CREATE TABLE MessageRecipient (MessageId int, Recipient nvarchar(max));

-- insert data
INSERT INTO MessageRecipients VALUES (1, 'user1@domain.com; user2@domain.com; user3@domain.com');
INSERT INTO MessageRecipients VALUES (2, 'user@domain1.com; user@domain2.com');

-- insert into MessageRecipient
INSERT INTO MessageRecipient
SELECT MessageId, ltrim(rtrim(value))
FROM MessageRecipients 
CROSS APPLY STRING_SPLIT(Recipients, @delimiter)

-- output results
SELECT * FROM MessageRecipients;
SELECT * FROM MessageRecipient;

-- delete tables
DROP TABLE MessageRecipients;
DROP TABLE MessageRecipient;

Results:

MessageId   Recipients
----------- ----------------------------------------------------
1           user1@domain.com; user2@domain.com; user3@domain.com
2           user@domain1.com; user@domain2.com

and

MessageId   Recipient
----------- ----------------
1           user1@domain.com
1           user2@domain.com
1           user3@domain.com
2           user@domain1.com
2           user@domain2.com
Alex Vazhev
  • 1,363
  • 1
  • 18
  • 17
-1

for table = "yelp_business", split the column categories values separated by ; into rows and display as category column.

SELECT unnest(string_to_array(categories, ';')) AS category
   FROM yelp_business;
Simas Joneliunas
  • 2,890
  • 20
  • 28
  • 35