3

I write out a script on my page that looks like this:

<script type="text/javascript">
var test = 'Some user input';
</script>

Some user input comes from my C# code and was previously saved by a user. Now, the problem is, that from time I will get the following error in the browser:

Uncaught SyntaxError: Unexpected token ILLEGAL

I need some generic way of catching these illegal characters and remove them. For example, the following user input actually contains an illegal character:

<p><span>.</span></p>

I've actually had to remove the illegal character here in the Stackoverflow question because it also breaks the editor here - instead I've added the unspoiled example here in jsfiddle: https://jsfiddle.net/p8suow9m/

It's hard to see, but it's there. Looking at it in Google Chrome makes it a little easier to see:

enter image description here

Can someone help me with the right approach in C# to filter away these illegal characters?

Note that I cannot use an approach that uses a list of valid characters and removes everything else, because I do need to support a lot of special characters that are legal in a javascript string.

Niels Brinch
  • 3,033
  • 9
  • 48
  • 75
  • I think the character in this case may be [U+2028 (LINE SEPARATOR)](http://www.fileformat.info/info/unicode/char/2028/index.htm). JavaScript quoted strings cannot contain unescaped line breaks. – Phylogenesis Apr 12 '15 at 09:40

1 Answers1

1

Try the following c# code, it worked for me: P.S.: Couldn't post it as code, the editor bugs, so I missused the code snippet thing.

Regex.Replace(@"<p><span>test.
    > </span></p>", @"[^\u0000-\u007F]", string.Empty);
Legends
  • 21,202
  • 16
  • 97
  • 123
  • That worked for the specific example. The question is how general this is? Is this targeting the specific illegal token? – Niels Brinch Apr 12 '15 at 10:07
  • hehe that's what I asked myself right now. This code strips all ASCII characters out. You have to test it always – Legends Apr 12 '15 at 10:10
  • I don't think it strips all ASCII characters out. What do you mean exactly? – Niels Brinch Apr 12 '15 at 10:14
  • http://stackoverflow.com/questions/123336/how-can-you-strip-non-ascii-characters-from-a-string-in-c but also check this: http://stackoverflow.com/questions/140422/how-do-i-translate-8bit-characters-into-7bit-characters-i-e-%C3%9C-to-u/10036907#10036907 – Legends Apr 12 '15 at 10:15
  • Ahh, so it strips **non** ASCII from the string – Niels Brinch Apr 12 '15 at 10:16
  • 1
    Note, I ended up using this sequence instead, it also removes the non printable character, but doesn't remove the other special characters that are quite valid. 0020-007E – Niels Brinch Apr 12 '15 at 14:23