Retrieve all instances of strings beginning with certain char?

Question

I have column [Posts].[post_text] of type varchar(max), and the text in this column may have several instances of words that begin with #. So, for example, an example string would be:

#hoops #blog #sports The score is now 25-40. #NBA Please check our site for more
updates #Basket #Olympics #2016
Thanks for your support #FIBA #dunk #blog 
#NBA Basketball Blog #NBA

I would like to retrieve all instances of words that begin with #, and remove any repeated values. Once I get all the values in the string, I can easily group and remove repeats, but I have no idea how to list the instances.

Thanks.

http://stackoverflow.com/questions/10914576/t-sql-split-string may help. it splits on a comma but change it to # and you'll get `#word more words` which you could then drop everything after the first space in multiple rows. and then distinct the list... or you could just split on space, look for words starting with # and distinct it... http://stackoverflow.com/questions/314824/t-sql-opposite-to-string-concatenation-how-to-split-string-into-multiple-reco works too using a nice function and CTE. — xQbert, Aug 13 '15 at 16:00
have you looked at full text search? https://msdn.microsoft.com/en-us/library/ms142571.aspx I think it allows you to search for the individual components in a field — Jeremy, Aug 13 '15 at 18:06
Bring it down into a development environment like .NET. TSQL is not built for editing like that. — paparazzo, Aug 13 '15 at 19:40

score 1 · Answer 1 · answered Aug 14 '15 at 14:01

This operation should happen in a language better suited for complex string operations. This is a snippet from VB.Net just to get you started. The rest should be fairly easy ( reading rows from a table ( or flat file ), calling the parse function ( see code below ), then writing the row(s) of data to your destination table(s).

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim sData As String
    Dim sArray(10) As String
    Dim sArray2(10) As String
    Dim iLoopCtr As Integer

    sData = "#hoops #blog #sports The score is now 25-40. #NBA Please check our site for more"

    sArray = Split(sData, "#")

    For iLoopCtr = 1 To 4
        If InStr(sArray(iLoopCtr), " ") Then
            sArray2 = Split(sArray(iLoopCtr), " ")
            sArray(iLoopCtr) = sArray2(0)
        End If
    Next

    MsgBox(sArray(1))
    MsgBox(sArray(2))
    MsgBox(sArray(3))
    MsgBox(sArray(4))

End Sub

Note, this could be done in TSQL, but its functions in the area of text search, substring, etc are not as functionally rich or easy to use as those in VB, C#, etc. Further, you would probably have to break it up into multiple SQL functions or have somewhat lengthy code. And, it's quite straight fwd to use a VB or other pgm to read a table, do the parsing, write to another table, etc. But, if you have some constraint forcing you to build this in TSQL, it could be done. — Juan-Carlos, Aug 14 '15 at 14:54

score 0 · Answer 2 · answered Aug 14 '15 at 15:13

I ended up using tsql for the parsing. Considering this is more of a good to have instead of a requirement, I didn't want to have to maintain code outside the SSIS project.

This is more or less what I'll be using, and combine it with a cursor. I don't think that, in this case, there's an alternative to a cursor:

declare @returnList TABLE ([Name] [nvarchar] (500))
--add space at the end, or tsql breaks  
declare @stringToSplit VARCHAR(MAX) = '#hoops #blog #sports The score is now 25-40. #NBA Please check our site for more updates #Basket #Olympics #2016 Thanks for your support #FIBA #dunk #blog  #NBA Basketball Blog #NBA '
DECLARE @posChar INT
DECLARE @posSpace INT
declare @Tag varchar(100)

WHILE CHARINDEX('#', @stringToSplit) > 0
BEGIN
    SELECT @posChar = CHARINDEX('#', @stringToSplit) 
    select @posSpace = CHARINDEX(' ', @stringToSplit, @posChar) 
    --select @posChar, @posSpace
    select @Tag = substring(@stringToSplit, @posChar, @posSpace-@posChar)
    insert into @returnList select @tag
    select @stringToSplit = substring(@stringToSplit, @posSpace, LEN(@stringToSplit))
    --select @stringToSplit 
END
select *from @returnList

Retrieve all instances of strings beginning with certain char?

2 Answers2