I need to find out how to code T-SQL that gets the 5th word in each row in a database
-
1Welcome, please read [how to ask](https://stackoverflow.com/help/how-to-ask) – Dale K Jun 22 '20 at 02:09
-
1And [how to ask a good sql question](https://meta.stackoverflow.com/questions/271055/tips-for-asking-a-good-structured-query-language-sql-question) – Dale K Jun 22 '20 at 02:10
-
1An entirely inappropriate use of tsql. But ignoring that, what were your most recent topics in the course you are taking? Generally, those topics relate directly to your homework. Perhaps a WHILE loop is a part of the intended solution. And just to be clear, there are "better" solutions than a WHILE loop. – SMor Jun 22 '20 at 02:12
-
Just looking for the simplest solution – abs Jun 22 '20 at 02:31
-
There's a much faster method of doing this in T-SQL than using a WHILE or XML. Dunno about the JSON solution because I've not tried it before. But, to demo the solution I'm thinking of, I need to know the name of the column and the datatype of the column, please. – Jeff Moden Jun 22 '20 at 18:27
-
@JeffModen I know your approach quite well (the famous *Moden Splitter* ). AFAIK Json is faster with one-time parsing. You might use my sample from my answer and add your method to the performance test. Would be curious to know your results... – Shnugo Jun 22 '20 at 20:13
-
@Shnugo - Thank you for your offer. I'll give it a shot. – Jeff Moden Jun 22 '20 at 20:16
-
@Shnugo - Okidoki! Thank you for getting me interested in using JSON as a string splitter. I'm not finished with my testing on the method by a long shot but wanted to let you know that I'm doing some pretty extreme performance testing and JSON is looking pretty encouraging even though the method relies on some concatenation. Thank you again for the suggestion and for the test code you posted. This is REALLY interesting to me because it doesn't look like MS is going to fix the SPLIT_STRING function any time in the near future. – Jeff Moden Jun 29 '20 at 18:37
-
@JeffModen I'm happy to read this. JSON, although just a hack, is my favourite *string engine* in T-SQL now ;-) In [this answer](https://stackoverflow.com/a/38274439/5089204) I explained why: **type-safe** splitting of CSV data and position safe cherry picking are just two of the reasons. And it is pretty fast... What MS developers thought, while working on `STRING_SPLIT()`, will stay a riddle for ever... – Shnugo Jun 30 '20 at 06:26
-
Do you have code to split, say, 30 element CSVs from an entire table? The reason I ask is because, although the method you used is great and very fast for finding things like "the fifth element", I've test several renditions of JSON driven csv splitters with an unknown number of elements and they're all relatively terrible for performance. I'd like to test a good one but don't (yet) know enough about JSON in SQL Server to pull anything off myself. – Jeff Moden Jul 02 '20 at 01:27
-
@JeffModen, found this by chance... Without the `@shnugo`, there was no notification... Are you talking about CSV volumes with many rows, (loaded from somewhere, e.g. file system) or about lines of CSV data, stored in a table one by one? In other words: Is the splitting in lines part of the job? – Shnugo Jul 04 '20 at 12:32
-
@JeffModen, I lost this thread for a while... Any more input? Reading CSV data *generically* would need some specification of columns, types, delimiters, text markers and so on. The `WITH`-clause after `OPENJSON` would need to be a dynamically created statement... I really would appreciate to work on this together with you. – Shnugo Jul 10 '20 at 10:01
-
@Shnugo - Apologies for the disappearing act. I found some decent examples of how to split a column of (for example) CSVs with an unknown number of elements per row and have been playing with it. There's no doubt that finding something like the 5th element is faster using JSON but I'm not finding JSON to be a speedy full column splitter. I'm still playing with it all, though. I've also found that these forums strongly discourage tangential discussions on subjects even in these rather tiny "comments". That and people who think they know best and edit people's answers. :D – Jeff Moden Jul 10 '20 at 20:11
-
@JeffModen no problem and I'm fully d'accord about this to be the wrong place for deeper discussions – Shnugo Jul 10 '20 at 20:28
4 Answers
One option which will ensure sequence is to use a bit of XML
Example
Declare @YourTable table (ID int, SomeCol varchar(100))
Insert into @YourTable Values
(1,'Some text that needs to be parsed')
,(2,'Only three words')
Select A.ID
,Pos5 = cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml).value('/x[5]','varchar(50)')
From @YourTable A
Returns
ID Pos5
1 to
2 NULL

- 79,615
- 7
- 44
- 66
-
-
Hi John, I was curious about the actual performance difference. It is more than I thought... (see my answer's UPDATE section) – Shnugo Jun 22 '20 at 08:30
If you are on v2016 or higher there is a trick with JSON, which is faster than XML.
Credits to John Cappelletti for the DDL/INSERT)
Declare @YourTable table (ID int, SomeCol varchar(100))
Insert into @YourTable Values
(1,'Some text that needs to be parsed')
,(2,'Only three words');
SELECT JSON_VALUE(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]'),'$[4]')
FROM @YourTable t
The idea is to transform the string into a JSON array an pick the item by its (zero-based!) position.
UPDATE Performance
There are three different answers now... Here is a comparison of their performance:
SET NOCOUNT ON;
DECLARE @YourTable TABLE (ID INT IDENTITY, SomeCol VARCHAR(100))
DECLARE @cnt INT=0;
--insert 100k rows
--important: each string must be different to avoid biased results due to caching...
WHILE @cnt<100000
BEGIN
INSERT INTO @YourTable SELECT CONCAT('Some text that needs to', @cnt,' be parsed');
SET @cnt += 1;
END
--we will measure each approach's duration
DECLARE @d DATETIME2=SYSUTCDATETIME();
--JSON_VALUE reads from the parsed value directly
SELECT t.ID,JSON_VALUE(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]'),'$[4]') AS fifth
INTO #tbl1
FROM @YourTable t;
SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());
SET @d=SYSUTCDATETIME();
--In this approach we use OPENJSON and implicit pivoting with a WITH clause
SELECT t.ID,f.fifth
INTO #tbl2
FROM @YourTable t
CROSS APPLY OPENJSON(CONCAT('[["',REPLACE(SomeCol,' ','","'),'"]]'))
WITH (fifth VARCHAR(255) '$[4]') as f
SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());
SET @d=SYSUTCDATETIME();
--In this approach we use XML to get the 5th element
SELECT t.ID,cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml).value('/x[5]','varchar(50)') AS fifth
INTO #tbl3
FROM @YourTable t
SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());
GO
--You can check the result
SELECT TOP 100 * FROM #tbl1;
SELECT TOP 100 * FROM #tbl2;
SELECT TOP 100 * FROM #tbl3;
GO
--cleanup
DROP TABLE #tbl1;
GO
DROP TABLE #tbl2;
GO
DROP TABLE #tbl3;
GO
On my system JSON_VALUE (~450ms) is about half of OPENJSON(~900ms) and a 10th(!!! ~4800ms) of XML.
UPDATE 2
Reading more than one vlaue from the string in one go tends to favour the solution with OPENJSON
:
DECLARE @d DATETIME2=SYSUTCDATETIME();
SELECT t.ID,JSON_VALUE(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]'),'$[4]') AS fifth
,JSON_VALUE(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]'),'$[1]') AS [second]
,JSON_VALUE(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]'),'$[0]') AS [first]
INTO #tbl1
FROM @YourTable t;
SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());
SET @d=SYSUTCDATETIME();
SELECT t.ID,f.fifth,f.[second],f.[first]
INTO #tbl2
FROM @YourTable t
CROSS APPLY OPENJSON(CONCAT('[["',REPLACE(SomeCol,' ','","'),'"]]'))
WITH (fifth VARCHAR(255) '$[4]'
,[second] VARCHAR(255) '$[1]'
,[first] VARCHAR(255) '$[0]') as f
SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());
SET @d=SYSUTCDATETIME();
SELECT t.ID,cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml).value('/x[5]','varchar(50)') AS fifth
,cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml).value('/x[2]','varchar(50)') AS [second]
,cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml).value('/x[1]','varchar(50)') AS [first]
INTO #tbl3
FROM @YourTable t
SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());
GO
In this case (reading 3 items at once) I get ~1800ms for JSON_VALUE, just ~1100ms for OPENJSON and - again about 10x - 18200ms for the XML approach.
UPDATE 3 - just for fun ;-)
The test in section UPDATE 2 would work better for JSON_VALUE using this:
SELECT t.ID,JSON_VALUE(TheJsonString,'$[4]') AS fifth
,JSON_VALUE(TheJsonString,'$[1]') AS [second]
,JSON_VALUE(TheJsonString,'$[0]') AS [first]
INTO #tbl1
FROM @YourTable t
CROSS APPLY(VALUES(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]')))A(TheJsonString);
... thus avoiding repeated string manipulations... In this case it is at the same speed as OPENJSON.
Conclusio
So the final suggestion: Use JSON_VALUE for a single item and OPENJSON for more than one item and XML if you are below v2016.

- 66,100
- 9
- 53
- 114
This is a good question, and requires and highlights a couple of useful features of SQL.
It can be done using a combination of CTE, STRING_SPLIT and CROSS APPLY.
First create the table and insert a couple of rows of data:
create table mywords(r int, w varchar(100));
insert mywords values
(1, 'one two three four five six seven')
,(2, 'W1 W2 W3 W4 W5 W6 W7');
Then the query
;with words as
(
SELECT r,
row_number() over (partition by r order by w) as n,
Value
FROM mywords
CROSS APPLY STRING_SPLIT(w, ' ')
)
select * from words where n=5
The result is:
r n Value
1 5 five
2 5 W5

- 2,759
- 1
- 7
- 16
-
2There is no GTD that string_split() will maintain the sequence. Tragic oversight on the part of MS. – John Cappelletti Jun 22 '20 at 02:37
-
You can get it using OPENJSON. Thanks to @TomC for create script.
create table #mywords(r int, w varchar(100));
insert #mywords values
(1, 'one two three four five six seven')
,(2, 'W1 W2 W3 W4 W5 W6 W7');
SELECT r, f.*
FROM #mywords
CROSS APPLY OPENJSON(CONCAT('[["',REPLACE(w,' ','","'),'"]]'))
WITH
(fifth VARCHAR(255) '$[4]') as f
+---+-------+
| r | fifth |
+---+-------+
| 1 | one |
| 2 | W1 |
+---+-------+

- 12,181
- 2
- 31
- 58
-
Although this works, in this case using `APPLY` with `OPENJSON`, the doubled `[[...]]` (array within an array) to allow a `WITH` clause and the implicit pivoting of the result is quite some unnecessary overhead... See my answer for the update section... – Shnugo Jun 22 '20 at 08:29
-
@Shnugo, I agree with you. JSON_VALUE seems better choice for picking up a single value. I have upvoted your answer. – Venkataraman R Jun 22 '20 at 09:01