SELECT first 10 words from SQL Table

Question

I need select query for first 10 words from SQL Table Field.

I have field Name: Description, so i need first 10 words from description field.

EX: Description: I don't feel good about this answer; I don't like the GROUP BY and I would rather not do the table of numbers. Hopefully somebody can pick me up on this.

output: I don't feel good about this answer; I don't like

please add expected output,inputs and what have you tried so far — TheGameiswar, Mar 06 '17 at 07:57
look here to know more on how to ask a perfect question:https://spaghettidba.com/2015/04/24/how-to-post-a-t-sql-question-on-a-public-forum/ — TheGameiswar, Mar 06 '17 at 07:58
*What* answers are you talking about? In any case numbers tables make things *faster*. The problem is the question itself - it's extremely vague. In any case, SQL isn't a string manipulation language. It's not whether you like a query or not but whether it produces the results in the most efficient way — Panagiotis Kanavos, Mar 06 '17 at 08:55
The way this question is written, you may be asking how to split descriptions and pick the first 10 words as a result. Or you may be asking how to extract the first 10 words from each description instead of the full description. Why don't you do that on the *client* anyway? — Panagiotis Kanavos, Mar 06 '17 at 08:56

John Cappelletti · Answer 1 · 2017-03-06T09:29:44.747

3

Another option is to use a CROSS APPLY and a little XML. The logic in the CROSS APPLY could easily be ported into a UDF (scalar or table-valued).

Example

Declare @YourTable table (ID int,LongDesc varchar(max))
Insert into @YourTable values
(1,'I don''t feel good about this answer; I don''t like the GROUP BY and I would rather not do the table of numbers. Hopefully somebody can pick me up on this.')

Select A.ID
      ,ShortDesc = B.S
 From  @YourTable A
 Cross Apply (
                Select S = xDim.value('/x[1]','varchar(100)')+' '
                          +xDim.value('/x[2]','varchar(100)')+' '
                          +xDim.value('/x[3]','varchar(100)')+' '
                          +xDim.value('/x[4]','varchar(100)')+' '
                          +xDim.value('/x[5]','varchar(100)')+' '
                          +xDim.value('/x[6]','varchar(100)')+' '
                          +xDim.value('/x[7]','varchar(100)')+' '
                          +xDim.value('/x[8]','varchar(100)')+' '
                          +xDim.value('/x[9]','varchar(100)')+' '
                          +xDim.value('/x[10]','varchar(100)')
                From  (Select Cast('<x>' + replace((Select replace(A.LongDesc,' ','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A 
             ) B

Returns

ID  ShortDesc
1   I don't feel good about this answer; I don't like

edited Mar 06 '17 at 09:29

answered Mar 06 '17 at 09:14

John Cappelletti

79,615
7
44
66

@maSTArHiAn I'm all about benchmarks and willing to admit when I'm wrong. However, in this case my approach is faster. Ran a series benchmark test on 100K records. 5 runs each. My approach a average of 161 ms. Your approach 377 ms. – John Cappelletti Mar 07 '17 at 21:47
i should say what tool did you use? – RoMEoMusTDiE Mar 07 '17 at 21:54
Remote connection to a client's BIG ASS server. Similar results on my laptop. As far as the test, I import/refresh all of the FRED series definitions ... some 579,733 records with an avg note length of 544 characters or 74 words. My advice it to run your own test and decide what works best for your. BTW - I digg the debate :) – John Cappelletti Mar 07 '17 at 22:27
@JohnCappelletti , your script is hard coded,suppose in another example i want top 15 then i hv to modify your script.I cannot pass parameter – KumarHarsh Mar 09 '17 at 09:43
@KumarHarsh If you want variable, you would have to go vertical, or a UDF – John Cappelletti Mar 09 '17 at 14:12
@KumarHarsh Should have mentioned. If you want 15, just follow the pattern. If you want variable then you would have to go vertical – John Cappelletti Mar 09 '17 at 14:14

score 2 · Answer 2 · edited Sep 07 '22 at 09:23

2

Is the left() function ok?

select left(Description, 10) from your_table;

edited Sep 07 '22 at 09:23

RF1991

2,037
4
8
17

answered Mar 06 '17 at 09:06

Shang Gao

21
1

3

This will return the first 10 characters, not the first 10 words – Panagiotis Kanavos Mar 06 '17 at 09:16

score 1 · Answer 3 · edited May 23 '17 at 11:46

1

You can check this answer in order to learn how to implement and use .net functions in the context of SQL Server. Here, I am using SQLSCLR implementation of the of the .net Regex.Matchfunction.

DECLARE @DataSource TABLE
(
    [Description] NVARCHAR(MAX)
);

INSERT INTO @DataSource ([Description])
VALUES ('word01, word02,         word03, word04, word05, word06')
      ,('word01,word02, word03, word04, word05, word06')
      ,('word01!word02, word03: word04, word05, word06');

SELECT *
FROM @DataSource DS
CROSS APPLY [dbo].[fn_Utils_RegexMatches] ([Description], '^(\w+\b.*?){3}') RM;

This gives you the following output:

which is too detailed (and extraging only the first three words). Your final query can be something like this:

SELECT DS.[Description], RM.[CaptureValue]
FROM @DataSource DS
CROSS APPLY [dbo].[fn_Utils_RegexMatches] ([Description], '^(?n)(\w+\b.*?){3}') RM;

Anyway, using regex you can use the any separators you want, but the more important thing is you can execute .net code in the context of T-SQL which is huge.

You have a lot of to read and learn from here. The previous answer is more easy and faster to be implemented.

edited May 23 '17 at 11:46

Community

1
1

answered Mar 06 '17 at 08:23

gotqn

42,737
46
157
243

1

Huge in memory consumption and delays as well. Even with the most careful coding a regex will generate a match tree whether you need the results or not. The way these queries are written, it will do so for *all* rows. There is a reason SQLCLR string splitting methods *don't* use regex – Panagiotis Kanavos Mar 06 '17 at 08:50
Yes, you are right. You need to test each time your solution in order to be sure it will work as expected with your production data. This is demonstration of SQLCLR only. – gotqn Mar 06 '17 at 09:16
It's a link to a demonstration. If you search for `T-SQL string split` you'll find various SQLCLR implementations. The fastest enumerates over characters directly only as long as needed. It's speed is comparable to SQL Server 2016's STRING_SPLIT. If you only want 10 words out of 100, that's a huge improvement – Panagiotis Kanavos Mar 06 '17 at 09:19
A regex would help if it could return everything up to the 10th occurence of a space, or just the index of the 10th space. This wouild be a single match that could be used directly, or as and argument to `LEFT()`. It would probably be faster than `STRING_SPLIT` because it wouldn't generate any substrings – Panagiotis Kanavos Mar 06 '17 at 09:26
I think `^(?:[^\s]*\s){10}` would return everything up to and including the 10th whitespace as a single string result – Panagiotis Kanavos Mar 06 '17 at 09:35

score 1 · Answer 4 · answered Mar 07 '17 at 20:58

can you please try this

declare @message nvarchar(max) = 'I don''t feel good about this answer; I don''t like the GROUP BY and I would rather not do the table of numbers. Hopefully somebody can pick me up on this'

SELECT left(@message,
CHARINDEX(' ', @message,
  CHARINDEX(' ', @message,
    CHARINDEX(' ',@message,
     CHARINDEX(' ',@message,
      CHARINDEX(' ',@message,
       CHARINDEX(' ',@message,
        CHARINDEX(' ',@message,
         CHARINDEX(' ',@message,
          CHARINDEX(' ',@message,
            CHARINDEX(' ',@message) + 1 ) + 1 ) + 1 ) + 1) + 1) + 1) + 1 )+ 1 ) + 1))

Regards

score 0 · Answer 5 · answered Mar 06 '17 at 10:33

0

try this with big data.

DECLARE @T TABLE(COL1 NVARCHAR(MAX))
INSERT INTO @T VALUES
('I dont feel good about this answer; I dont like the GROUP BY and I would rather not do the table of numbers. Hopefully somebody can pick me up on this.')

;WITH CTE AS
(
select col1,substring(col1,1,CHARINDEX(' ',col1))Words
,substring(col1,CHARINDEX(' ',col1)+1,len(col1))Residue
, 1 rn from @t

union ALL
select col1,substring(Residue,1,CHARINDEX(' ',Residue))
,substring(Residue,CHARINDEX(' ',Residue)+1,len(Residue))
,rn+1
from cte
where len(Words)>0 and rn<10
)
SELECT Words FROM cte

answered Mar 06 '17 at 10:33

KumarHarsh

5,046
1
18
22

will take 60ms to run – RoMEoMusTDiE Mar 07 '17 at 21:09
@maSTArHiAn , i just tested,for 10,000 rows(that means output is 1,00,000 rows ) avg time is 6ms.can you throw sample data – KumarHarsh Mar 08 '17 at 04:45
Execution elapsed time not writing time. – RoMEoMusTDiE Mar 08 '17 at 04:50
@maSTArHiAn, you are right.my script take 6232 ms for 10000 rows.John Cappelletti script takes just 279 ms. I come with other solution soon.But in another example ,Recursive CTE take far less time than xml method.if i get that link,i post it here. – KumarHarsh Mar 08 '17 at 11:51

score 0 · Answer 6 · answered Mar 06 '17 at 10:51

I know you already have an answer, but I just wanted to do it in a different way.

DECLARE @WORD VARCHAR(8000)= 'I don''t feel good about this answer; I don''t like the GROUP BY and I would rather not do the table of numbers. Hopefully somebody can pick me up on this.'
DECLARE @LOOP INT = 1
DECLARE @STR_POS INT = 1
DECLARE @CI INT
DECLARE @RES_STR VARCHAR(8000) = ''

WHILE(@LOOP<=10)
    BEGIN
        SELECT @CI = CHARINDEX(' ',@WORD,@STR_POS)
        SELECT @RES_STR = @RES_STR +' '+ SUBSTRING(@WORD,@STR_POS,(CHARINDEX(' ',@WORD,@STR_POS)-@STR_POS))
        SET @STR_POS = @CI+1
        SET @LOOP=@LOOP+1

    END
PRINT @RES_STR

score 0 · Answer 7 · answered Mar 06 '17 at 21:12

This does not give you a row set solution, but another option to parse out the first ten words using a similar technique with less code, purely for illustration.. i suppose this can be converted to a user defined function, but we know they don't perform well. it can probably be combined using CrossApply to eliminate the need for a UDF. I must credit this for help with xml syntax: https://www.mssqltips.com/sqlservertip/1771/splitting-delimited-strings-using-xml-in-sql-server/

DECLARE 
@xml as xml,
@str as varchar(1000)= 'I don''t feel good about this answer; I don''t like the GROUP BY and I would rather not do the table of numbers. Hopefully somebody can pick me up on this.',
@delimiter as char(1) = char(32),
@xmlspace  as char(13) = '<y>&#160;</y>'

SET @xml = cast(('<x>'+replace(@str,@delimiter,@xmlspace)+'</x>') as xml)

SELECT TOP 10 N.value('.', 'varchar(50)') FROM @xml.nodes('x') as T(N)

SELECT first 10 words from SQL Table

7 Answers7