0

I have issue with extracting text from string data using T-SQL. I have following data and text that I would like to extract (value between second and third underscore:

sss_ss_d_a -> d
aaa_dd_b -> b
aaa_aa -> NULL

I know that there is a lot of similar topics, but I have problem especially with handling NULLs and situation when there are no delimiter after desired value

Thank you very much for your help, regards

anders1990
  • 21
  • 5
  • you need multiple functions for this, take a look here to get the indexes https://dba.stackexchange.com/questions/41961/how-to-find-all-positions-of-a-string-within-another-string, and with this result you can create another function which returns the substring or null – Isparia Oct 01 '21 at 11:18
  • @Isparia Please avoid to point to "ancient" discussions. The accepted answer calls a `while` loop, which wasn't a good idea even about 8 years ago... This can be done much easier by now (btw: The XML based answer was working at those days already...) – Shnugo Oct 01 '21 at 14:34
  • @Shnugo i always thought helping someone learn something he doesnt know how to do, should start off with an older approach so he can learn from this, and question it to make it more performant in the end. thats how i learned to. Thats also the reason why i didnt post this as an answer. Because we are not here to write code for others, but to point them in the right direction. – Isparia Oct 04 '21 at 08:33
  • @Isparia, But pointing to *bad* solutions won't teach how to solve this... – Shnugo Oct 04 '21 at 08:39

1 Answers1

3

Try it like this:

Some sample data in a declared table

DECLARE @tbl TABLE(YourString VARCHAR(100));
INSERT INTO @tbl VALUES('sss_ss_d_a'),('aaa_dd_b'),('aaa_aa');

--The query

SELECT t.YourString
      ,CAST(CONCAT('<x>',REPLACE(t.YourString,'_','</x><x>'),'</x>') AS XML).value('/x[3]/text()[1]','nvarchar(max)')
FROM @tbl t;

The idea in short:

  • We replace the underscores with XML tags thus transforming the string to castable XML.
  • We use XQuery within .value to pick the third element.

Starting with v2016 the recommended approach uses JSON instead
(hint: we use "2" instead of "3" as JSON is zero-based).

      ,JSON_VALUE(CONCAT('["',REPLACE(t.YourString,'_','","'),'"]'),'$[2]')

The idea is roughly the same.

General hint: Both approaches might need escaping forbidden characters, such as < or & in XML or " in JSON. But this is easy...

Shnugo
  • 66,100
  • 9
  • 53
  • 114