T-SQL to get the 5th word in each row in a database

Question

I need to find out how to code T-SQL that gets the 5th word in each row in a database

Welcome, please read [how to ask](https://stackoverflow.com/help/how-to-ask) — Dale K, Jun 22 '20 at 02:09
And [how to ask a good sql question](https://meta.stackoverflow.com/questions/271055/tips-for-asking-a-good-structured-query-language-sql-question) — Dale K, Jun 22 '20 at 02:10
An entirely inappropriate use of tsql. But ignoring that, what were your most recent topics in the course you are taking? Generally, those topics relate directly to your homework. Perhaps a WHILE loop is a part of the intended solution. And just to be clear, there are "better" solutions than a WHILE loop. — SMor, Jun 22 '20 at 02:12
There's a much faster method of doing this in T-SQL than using a WHILE or XML. Dunno about the JSON solution because I've not tried it before. But, to demo the solution I'm thinking of, I need to know the name of the column and the datatype of the column, please. — Jeff Moden, Jun 22 '20 at 18:27
@JeffModen I know your approach quite well (the famous *Moden Splitter* ). AFAIK Json is faster with one-time parsing. You might use my sample from my answer and add your method to the performance test. Would be curious to know your results... — Shnugo, Jun 22 '20 at 20:13
@Shnugo - Okidoki! Thank you for getting me interested in using JSON as a string splitter. I'm not finished with my testing on the method by a long shot but wanted to let you know that I'm doing some pretty extreme performance testing and JSON is looking pretty encouraging even though the method relies on some concatenation. Thank you again for the suggestion and for the test code you posted. This is REALLY interesting to me because it doesn't look like MS is going to fix the SPLIT_STRING function any time in the near future. — Jeff Moden, Jun 29 '20 at 18:37
@JeffModen I'm happy to read this. JSON, although just a hack, is my favourite *string engine* in T-SQL now ;-) In [this answer](https://stackoverflow.com/a/38274439/5089204) I explained why: **type-safe** splitting of CSV data and position safe cherry picking are just two of the reasons. And it is pretty fast... What MS developers thought, while working on `STRING_SPLIT()`, will stay a riddle for ever... — Shnugo, Jun 30 '20 at 06:26
Do you have code to split, say, 30 element CSVs from an entire table? The reason I ask is because, although the method you used is great and very fast for finding things like "the fifth element", I've test several renditions of JSON driven csv splitters with an unknown number of elements and they're all relatively terrible for performance. I'd like to test a good one but don't (yet) know enough about JSON in SQL Server to pull anything off myself. — Jeff Moden, Jul 02 '20 at 01:27
@JeffModen, found this by chance... Without the `@shnugo`, there was no notification... Are you talking about CSV volumes with many rows, (loaded from somewhere, e.g. file system) or about lines of CSV data, stored in a table one by one? In other words: Is the splitting in lines part of the job? — Shnugo, Jul 04 '20 at 12:32
@JeffModen, I lost this thread for a while... Any more input? Reading CSV data *generically* would need some specification of columns, types, delimiters, text markers and so on. The `WITH`-clause after `OPENJSON` would need to be a dynamically created statement... I really would appreciate to work on this together with you. — Shnugo, Jul 10 '20 at 10:01
@Shnugo - Apologies for the disappearing act. I found some decent examples of how to split a column of (for example) CSVs with an unknown number of elements per row and have been playing with it. There's no doubt that finding something like the 5th element is faster using JSON but I'm not finding JSON to be a speedy full column splitter. I'm still playing with it all, though. I've also found that these forums strongly discourage tangential discussions on subjects even in these rather tiny "comments". That and people who think they know best and edit people's answers. :D — Jeff Moden, Jul 10 '20 at 20:11
@JeffModen no problem and I'm fully d'accord about this to be the wrong place for deeper discussions — Shnugo, Jul 10 '20 at 20:28

score 6 · Answer 1 · answered Jun 22 '20 at 02:36

6

One option which will ensure sequence is to use a bit of XML

Example

Declare @YourTable table (ID int, SomeCol varchar(100))
Insert into @YourTable Values
 (1,'Some text that needs to be parsed')
,(2,'Only three words')

Select A.ID
      ,Pos5  =  cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml).value('/x[5]','varchar(50)')
 From  @YourTable A

Returns

ID  Pos5
1   to
2   NULL

answered Jun 22 '20 at 02:36

John Cappelletti

79,615
7
44
66

Good answer, +1 from my side... I added one more with a JSON approach – Shnugo Jun 22 '20 at 06:18
Hi John, I was curious about the actual performance difference. It is more than I thought... (see my answer's UPDATE section) – Shnugo Jun 22 '20 at 08:30

Shnugo · Answer 2 · 2020-06-22T08:53:53.873

If you are on v2016 or higher there is a trick with JSON, which is faster than XML.

Credits to John Cappelletti for the DDL/INSERT)

Declare @YourTable table (ID int, SomeCol varchar(100))
Insert into @YourTable Values
 (1,'Some text that needs to be parsed')
,(2,'Only three words');

SELECT JSON_VALUE(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]'),'$[4]')
FROM @YourTable t

The idea is to transform the string into a JSON array an pick the item by its (zero-based!) position.

UPDATE Performance

There are three different answers now... Here is a comparison of their performance:

SET NOCOUNT ON;

DECLARE @YourTable TABLE (ID INT IDENTITY, SomeCol VARCHAR(100))
DECLARE @cnt INT=0;

--insert 100k rows
--important: each string must be different to avoid biased results due to caching...
WHILE @cnt<100000
BEGIN
    INSERT INTO @YourTable SELECT CONCAT('Some text that needs to', @cnt,' be parsed');
    SET @cnt += 1;
END

--we will measure each approach's duration
DECLARE @d DATETIME2=SYSUTCDATETIME();

--JSON_VALUE reads from the parsed value directly
SELECT t.ID,JSON_VALUE(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]'),'$[4]') AS fifth
INTO #tbl1
FROM @YourTable t;

SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();

--In this approach we use OPENJSON and implicit pivoting with a WITH clause
SELECT t.ID,f.fifth
INTO #tbl2
FROM @YourTable t
CROSS APPLY OPENJSON(CONCAT('[["',REPLACE(SomeCol,' ','","'),'"]]')) 
WITH (fifth VARCHAR(255) '$[4]') as f

SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();

--In this approach we use XML to get the 5th element
SELECT t.ID,cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml).value('/x[5]','varchar(50)') AS fifth
INTO #tbl3
FROM @YourTable t

SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());
GO

--You can check the result
SELECT TOP 100 * FROM #tbl1; 
SELECT TOP 100 * FROM #tbl2; 
SELECT TOP 100 * FROM #tbl3; 
GO
--cleanup

DROP TABLE #tbl1;
GO
DROP TABLE #tbl2;
GO
DROP TABLE #tbl3;
GO

On my system JSON_VALUE (~450ms) is about half of OPENJSON(~900ms) and a 10th(!!! ~4800ms) of XML.

UPDATE 2

Reading more than one vlaue from the string in one go tends to favour the solution with OPENJSON:

DECLARE @d DATETIME2=SYSUTCDATETIME();

SELECT t.ID,JSON_VALUE(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]'),'$[4]') AS fifth
           ,JSON_VALUE(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]'),'$[1]') AS [second]
           ,JSON_VALUE(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]'),'$[0]') AS [first]
INTO #tbl1
FROM @YourTable t;

SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();

SELECT t.ID,f.fifth,f.[second],f.[first]
INTO #tbl2
FROM @YourTable t
CROSS APPLY OPENJSON(CONCAT('[["',REPLACE(SomeCol,' ','","'),'"]]')) 
WITH (fifth VARCHAR(255) '$[4]'
     ,[second] VARCHAR(255) '$[1]'
     ,[first] VARCHAR(255) '$[0]') as f

SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();

SELECT t.ID,cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml).value('/x[5]','varchar(50)') AS fifth
           ,cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml).value('/x[2]','varchar(50)') AS [second]
           ,cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml).value('/x[1]','varchar(50)') AS [first]
INTO #tbl3
FROM @YourTable t

SELECT DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());
GO

In this case (reading 3 items at once) I get ~1800ms for JSON_VALUE, just ~1100ms for OPENJSON and - again about 10x - 18200ms for the XML approach.

UPDATE 3 - just for fun ;-)

The test in section UPDATE 2 would work better for JSON_VALUE using this:

SELECT t.ID,JSON_VALUE(TheJsonString,'$[4]') AS fifth
           ,JSON_VALUE(TheJsonString,'$[1]') AS [second]
           ,JSON_VALUE(TheJsonString,'$[0]') AS [first]
INTO #tbl1
FROM @YourTable t
CROSS APPLY(VALUES(CONCAT('["',REPLACE(t.SomeCol,' ','","'),'"]')))A(TheJsonString);

... thus avoiding repeated string manipulations... In this case it is at the same speed as OPENJSON.

Conclusio

So the final suggestion: Use JSON_VALUE for a single item and OPENJSON for more than one item and XML if you are below v2016.

Great answer as usual. Very interesting findings. – Venkataraman R Jun 22 '20 at 09:00 — Venkataraman R, Jun 22 '20 at 09:00

score 0 · Answer 3 · answered Jun 22 '20 at 02:34

This is a good question, and requires and highlights a couple of useful features of SQL.

It can be done using a combination of CTE, STRING_SPLIT and CROSS APPLY.

First create the table and insert a couple of rows of data:

create table mywords(r int, w varchar(100));
insert mywords values 
   (1, 'one two three four five six seven')
  ,(2, 'W1 W2 W3 W4 W5 W6 W7');

Then the query

;with words as 
(
SELECT r, 
    row_number() over (partition by r order by w) as n, 
    Value 
FROM mywords 
    CROSS APPLY STRING_SPLIT(w, ' ')
)
select * from words where n=5

The result is:

r   n   Value
1   5   five
2   5   W5

There is no GTD that string_split() will maintain the sequence. Tragic oversight on the part of MS. — John Cappelletti, Jun 22 '20 at 02:37

score 0 · Answer 4 · answered Jun 22 '20 at 07:36

0

You can get it using OPENJSON. Thanks to @TomC for create script.

create table #mywords(r int, w varchar(100));
insert #mywords values 
   (1, 'one two three four five six seven')
  ,(2, 'W1 W2 W3 W4 W5 W6 W7');

  SELECT r, f.* 
  FROM #mywords
  CROSS APPLY OPENJSON(CONCAT('[["',REPLACE(w,' ','","'),'"]]')) 
  WITH
  (fifth VARCHAR(255) '$[4]') as f

+---+-------+
| r | fifth |
+---+-------+
| 1 | one   |
| 2 | W1    |
+---+-------+

answered Jun 22 '20 at 07:36

Venkataraman R

12,181
2
31
58

Although this works, in this case using `APPLY` with `OPENJSON`, the doubled `[[...]]` (array within an array) to allow a `WITH` clause and the implicit pivoting of the result is quite some unnecessary overhead... See my answer for the update section... – Shnugo Jun 22 '20 at 08:29
@Shnugo, I agree with you. JSON_VALUE seems better choice for picking up a single value. I have upvoted your answer. – Venkataraman R Jun 22 '20 at 09:01

T-SQL to get the 5th word in each row in a database

4 Answers4

Linked