How should I deal with escapes when splitting a delimited nvarchar(max) in SQL

Question

I am in the process of rolling over a bunch of old stored procedures that take NVARCHAR(MAX) strings of comma and/or semicolon separated values (never mind about one value per variable etc.). The code is currently using the CHARINDEX approach described in this question, though in principle any of the approaches would work (I'm tempted to replace it with the XML one, because neatness).

The question, though, is what is the most efficient may of handing escaped delimiters? Obviously the lowest level approach is a character by character parser, but I can't shake the feeling that (1) that's going to be horrible when executed a million times in close succession and (2) it'll be overcomplicated for the situation.

Basically, I want to handle 3 possible escapes:

"\\", "\,", and "\;" somewhere in my string. What's the best way to do it? I should add that, ideally, I don't want to make any assumptions about what characters are included in the string.

Sample data would look something like the below.

Value1,Value\,2,ValueWithSlashAtTheEnd\\,ValueWithSlashAndCommaAtTheEnd\\\,

I'm actually splitting to rows rather than columns, but the principle is the same; I'd expect the below output typically:

SomeName
^^^^^^^^
Value1
Value,2
ValueWithSlashAtTheEnd\
ValueWithSlashAndCommaAtTheEnd\,

Needless to say, the escapes could occur anywhere in a value, and ideally I'd like to handle for semicolons as well, but I'll probably be able to infer that from the comma behaviour.

score 0 · Accepted Answer · answered Dec 10 '15 at 13:05

0

Just provide your function edited string:

replace(replace(@yourstring, '\\', '^'), '\,', '#')

Then replace back:

replace(replace(@returnedstring, '#', ','), '^', '\')

Replace ^ and # with any characters that are not on the string.

answered Dec 10 '15 at 13:05

Giorgi Nakeuri

35,155
8
47
75

Any suggestions for robust identification of characters not in the string? Would rather not need to throw an error in the case of every character in ASCII being used. And whilst I don't expect to support every character in Unicode, I might well need to support at least some (unknown instances) of them. – tobriand Dec 10 '15 at 13:21
@tobriand, instead of `^` you can use `^$#@`. I can't believe you could have such string :) – Giorgi Nakeuri Dec 10 '15 at 13:26
I don't *expect* to, it's true. I just get the impression that languages with proper escaping somehow make themselves more robust (against that kind of thing), which is mainly what I'd like to find out how to accomplish. – tobriand Dec 10 '15 at 15:09
@tobriand, unfortunately Sql Server lacks everything that is related to regex. Only basic regex is supported and in limited number of functions. – Giorgi Nakeuri Dec 10 '15 at 15:13
Right, worked out a fairly (fairly) neat way to make this work. Not sure if it's the *nicest* approach, but it does the job. Basically you while loop using `CHARINDEX(NChar(SomeNumber),@Target)` until you find N unused characters, one for each of required escape. Then loop through your list of possible escapes and replace each one (assigning in select is awesome for this, just be careful to put `\\` first), split values and undo replacements. – tobriand Dec 10 '15 at 15:44
Only issue is I'm not certain if there's some Unicode special characters that'll mess me up. Hopefully not though. – tobriand Dec 10 '15 at 15:45

How should I deal with escapes when splitting a delimited nvarchar(max) in SQL

1 Answers1