I have a Java program that takes in a string and escapes it so that it can be safely passed to a program in bash. The strategy is basically to escape any of the special characters mentioned here and wrap the result in double quotes.
The algorithm is pretty simple -- just loop over the input string and use input.charAt(i)
to check whether the current character needs to be escaped.
This strategy works quite well for characters that aren't represented by surrogate pairs, but I have some concerns if non-latin characters or something like an emoji is embedded in the string. In that case, if we assumed that an emoji was the first character in my input string, input.charAt(0)
would give me the first code unit while input.charAt(1)
would return the second code unit. My concern is that some of these code units might be interpreted as one of the special characters that need to be escaped. If that happened, I'd try to escape one of the code units which would irrevocably garble the input.
Is such a thing possible? Or is it safe to use input.charAt(i)
for something like this?