Sanitising strings in R

Question

This is related to a previous question, here: Converting a \u escaped Unicode string to ASCII

I proposed a solution involving eval(parse(text=x)), which for non-R users, means what it says: parsing the text string, then evaluating it. The aim was not to allow arbitrary code to be executed, but only to un-escape escaped Unicode text. Hence the solution:

eval(parse(text=paste0("'", x, "'")))

While this should be fairly safe given the restricted objective, I'd be interested to know: how much sanitisation is required to keep things safe?

At a minimum, I guess any embedded single and double quotes have to be escaped. For example, suppose we have

x <- "this is a '; print(dir()); 'string"

Then eval'ing this per the snippet above would execute the code in the middle. So we have to escape the quotes:

eval(parse(text=paste0("'",
                       gsub("'", "\\\\'", x),
                       "'")))

And similarly for double quotes. I don't think the unescaped Unicode equivalents \u0022 and \u0027 are a problem, since to the parser they'll be identical to plain " and '.

Are there any holes in this approach that I've missed?

It would be good if there was a way to parse escaped strings without the machinery (and risk) of `eval`! — seancarmody, Jul 21 '13 at 08:10
I have come up with an alternative without `eval`: http://stackoverflow.com/a/17771985/1543437 — seancarmody, Jul 21 '13 at 11:48

score 5 · Accepted Answer · answered Jul 21 '13 at 11:19

this is a \'; print(dir()); 'string

is escaped to:

'this is a \\'; print(dir()); 'string'

double-backslash is evaled as literal backslash, quote is active, code is executed.

Also I don't know about R but probably you could at minimum cause a crash using raw control characters like newline or invalid escapes.

eval is a mug's game in general. Normal string handling (search string for the sequence you want, replacing it) is the better approach, and using an existing library for a particular properly-specified format is best of all. For example if you have JSON, use a JSON parser. There are many possible string literal formats that use \u escapes, all with slightly different rules, so you will want to choose the exact format correctly.

Actually, the parser will give an error on that input, so the entire expression (including the embedded code) will never be evaluated. But that's not very good either.... — Hong Ooi, Jul 21 '13 at 13:48

score 0 · Answer 2 · answered Apr 23 '14 at 20:53

0

There is shQuote function which could work for you:

eval(parse(text=shQuote(x)))
# [1] "this is a '; print(dir()); 'string"

answered Apr 23 '14 at 20:53

Marek

49,472
15
99
121

Sanitising strings in R

2 Answers2

Linked