1

The description field is a text area field, somehow a user ended up with some strange little symbol in it. (see image)

enter image description here

When I grab this from the server, I assemble my data from the objects I grab, which includes the description on this object, and turn it into JSON string, and send it to my javascript.

From javascript, I JSON.parse it. But that weird little symbol causes the parse to fail. But, when you look at it, there is no character there or anything, yet it throws an undefined character in JSON.parse.

My response from the server has the description like this:

"blahblahtesttext\r\nslkdjf",

There is nothing but the expected \r\n......

But it has an unexpected token where that symbol is.

{"value":"blah blah test text//Symbol should be here, but there is nothing and it forces it to the next line
\r\nslkdjf","fieldType":"TEXTAREA","field":"Description"}

Where that symbol forces the string to the next line, which causes the issue.

Because I can't see what the actual character is... I do not know how to handle this.

Is there something that can strip out invalid characters in a JSON string so the parse works? I don't want to just try/catch this as it would toss out everything, I just want that weird invalid symbol to be stripped out.

Or is there a way to see what the actual character is that JSON.parse does not like?


 <-- here is that symbol for copy pasting into a string if you want to try parsing it.

EDIT:

I found that it was doing this in Notepad++

enter image description here

Where you can see that where the line separator was, it is placing actual carriage return and line feed there, breaking the string. It already has \r\n\r\n for the two returns that were placed in the actual text area after that line separator character.

But still unsure of how to deal with this, as that carriage return and line feed do not appear in the string as '\n\r', there is no character representation of them, but instead it actually puts a return there and breaks the string.

NEW EDIT:

Finally found something to get this working. I couldn't do a replace on that line separator character. When I pulled it from my database, it came through as a hidden carriage return. When you manually pressed 'Enter' in the text area, the string I got from the database would actually put a '\r\n' there. But the line separator did not.

So, I added these three lines before parsing to ensure I was escaping any invalid new lines/carriage returns.

result = result.replace(/\r\n/g, '\\r\\n');
result = result.replace(/\r/g, '\\r');
result = result.replace(/\n/g, '\\n');

The '\r\n' that were actually in the string would correctly be escaped already, which tripped me up because I didn't have to worry about escaping those until someone tried introducing this line separator....

Tyler Dahle
  • 817
  • 1
  • 8
  • 37
  • 2
    That appears to be `"\u2028"`, Line Separator (the small box says “L SEP”). See the [MDN docs on `JSON.stringify` which mentions this unicode character](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#Issue_with_plain_JSON.stringify_for_use_as_JavaScript). – Sebastian Simon Aug 10 '18 at 17:57
  • 2
    characters saved in different text formats (ascii, latin-1, utf-8, utf-16, etc) may not be loaded / rendered correctly if you try to read the data in a different text file. I myself just recently had an issue with this – ControlAltDel Aug 10 '18 at 17:59
  • 2
    And some editors will show it as a red dot which you can delete. Only run into it a few times and always forget how i got rid of it – charlietfl Aug 10 '18 at 18:01
  • 1
    @Xufox - Fascinating. According to [the Unicode character tool](http://unicode.org/cldr/utility/character.jsp?a=2028), its general category is "Line_Separator" (it's the only thing in that category) but its Grapheme_Cluster_Break is "Control" (which it shares with various control characters). I'm failing to see how a line separator can not be classed as a control character, and thus be required to be encoded in JSON, but... Chrome's `JSON.stringify` doesn't encode it. (And `JSON.parse` is perfectly happy with it.) Wow. – T.J. Crowder Aug 10 '18 at 18:03
  • 1
    @T.J.Crowder There’s a [stage 4 proposal](https://github.com/tc39/proposal-json-superset) scheduled to land in ECMAScript 2019 to change this behavior. – Sebastian Simon Aug 10 '18 at 18:07
  • @Xufox - Thanks. I'd forgotten that (been about four months since I reviewed the outstanding proposals, I do recall seeing it). I suggest you take your two comments, and any part of [my answer below](https://stackoverflow.com/a/51791923/157247) that you think is useful, and turn them into an answer. (I'll delete my answer if you do; the only reason I don't is that you're *not quite* at 10k yet and wouldn't be able to see it.) – T.J. Crowder Aug 10 '18 at 18:10

1 Answers1

2

As Xufox says, that appears to be U+2028. JSON.parse shouldn't fail on it since U+2028 doesn't require escaping in JSON; Chrome's doesn't, but that's probably because it's implementing this stage 4 proposal Xufox pointed out:

const o = {prop: "testing\u2028one two three"};
console.log(JSON.parse(JSON.stringify(o)));

If you need to work around a JSON.parse implementation that doesn't handle it, you could do this:

str = str.replace(/\u2028/g, "\\u2028");

...before running JSON.parse on str.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • 1
    So I have a sneaking suspicion that this was copy/pasted into their description, and that line separator was in there. I can't even replace the line separator characters because the actual string I get in my javascript does not contain it, but it still finds an invalid character. I am unsure if I can even do anything about this. When I set the value in my server side controller, it does not contain the \u2028 character, but there is obviously something there... – Tyler Dahle Aug 10 '18 at 18:39
  • @TylerDahle - Well, it contains *something*, otherwise `JSON.parse` would work. :-) So you'll need to find out what that is. Perhaps it's getting converted to a newline, for instance, in which case the fix would be `str = str.replace(/\n/g, "\\u2028");`. Or some zero-width character. But *something* is there, and that means you can find and fix it. Look at the string in a debugger, or log it out as char codes (`Array.from(str).forEach(ch => console.log(ch.charCodeAt(0).toString(16)));`), etc. – T.J. Crowder Aug 10 '18 at 18:42
  • I added an edit to my question. I did find that it was adding an actual return+line feed, instead of an \n\r. Which will make it hard to replace.... – Tyler Dahle Aug 10 '18 at 18:52
  • @TylerDahle - I can't quite follow the edit, but it sounds like `replace` **did** solve it, or...? Although really you want to fix it earlier than that, ideally before the U+2028 gets turned into a carriage return or newline or whatever it is. – T.J. Crowder Aug 11 '18 at 08:43