0

I have column [Posts].[post_text] of type varchar(max), and the text in this column may have several instances of words that begin with #. So, for example, an example string would be:

#hoops #blog #sports The score is now 25-40. #NBA Please check our site for more
updates #Basket #Olympics #2016
Thanks for your support #FIBA #dunk #blog 
#NBA Basketball Blog #NBA

I would like to retrieve all instances of words that begin with #, and remove any repeated values. Once I get all the values in the string, I can easily group and remove repeats, but I have no idea how to list the instances.

Thanks.

fdkgfosfskjdlsjdlkfsf
  • 3,165
  • 2
  • 43
  • 110
  • http://stackoverflow.com/questions/10914576/t-sql-split-string may help. it splits on a comma but change it to # and you'll get `#word more words` which you could then drop everything after the first space in multiple rows. and then distinct the list... or you could just split on space, look for words starting with # and distinct it... http://stackoverflow.com/questions/314824/t-sql-opposite-to-string-concatenation-how-to-split-string-into-multiple-reco works too using a nice function and CTE. – xQbert Aug 13 '15 at 16:00
  • have you looked at full text search? https://msdn.microsoft.com/en-us/library/ms142571.aspx I think it allows you to search for the individual components in a field – Jeremy Aug 13 '15 at 18:06
  • 1
    Bring it down into a development environment like .NET. TSQL is not built for editing like that. – paparazzo Aug 13 '15 at 19:40

2 Answers2

1

This operation should happen in a language better suited for complex string operations. This is a snippet from VB.Net just to get you started. The rest should be fairly easy ( reading rows from a table ( or flat file ), calling the parse function ( see code below ), then writing the row(s) of data to your destination table(s).

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim sData As String
    Dim sArray(10) As String
    Dim sArray2(10) As String
    Dim iLoopCtr As Integer

    sData = "#hoops #blog #sports The score is now 25-40. #NBA Please check our site for more"

    sArray = Split(sData, "#")

    For iLoopCtr = 1 To 4
        If InStr(sArray(iLoopCtr), " ") Then
            sArray2 = Split(sArray(iLoopCtr), " ")
            sArray(iLoopCtr) = sArray2(0)
        End If
    Next

    MsgBox(sArray(1))
    MsgBox(sArray(2))
    MsgBox(sArray(3))
    MsgBox(sArray(4))

End Sub
Juan-Carlos
  • 377
  • 2
  • 8
  • Note, this could be done in TSQL, but its functions in the area of text search, substring, etc are not as functionally rich or easy to use as those in VB, C#, etc. Further, you would probably have to break it up into multiple SQL functions or have somewhat lengthy code. And, it's quite straight fwd to use a VB or other pgm to read a table, do the parsing, write to another table, etc. But, if you have some constraint forcing you to build this in TSQL, it could be done. – Juan-Carlos Aug 14 '15 at 14:54
0

I ended up using tsql for the parsing. Considering this is more of a good to have instead of a requirement, I didn't want to have to maintain code outside the SSIS project.

This is more or less what I'll be using, and combine it with a cursor. I don't think that, in this case, there's an alternative to a cursor:

declare @returnList TABLE ([Name] [nvarchar] (500))
--add space at the end, or tsql breaks  
declare @stringToSplit VARCHAR(MAX) = '#hoops #blog #sports The score is now 25-40. #NBA Please check our site for more updates #Basket #Olympics #2016 Thanks for your support #FIBA #dunk #blog  #NBA Basketball Blog #NBA '
DECLARE @posChar INT
DECLARE @posSpace INT
declare @Tag varchar(100)

WHILE CHARINDEX('#', @stringToSplit) > 0
BEGIN
    SELECT @posChar = CHARINDEX('#', @stringToSplit) 
    select @posSpace = CHARINDEX(' ', @stringToSplit, @posChar) 
    --select @posChar, @posSpace
    select @Tag = substring(@stringToSplit, @posChar, @posSpace-@posChar)
    insert into @returnList select @tag
    select @stringToSplit = substring(@stringToSplit, @posSpace, LEN(@stringToSplit))
    --select @stringToSplit 
END
select *from @returnList
fdkgfosfskjdlsjdlkfsf
  • 3,165
  • 2
  • 43
  • 110