I have a JSON file, which contains JSON from Clojure's data.json
library. The data came from Twitter where people seem to smile a lot.
$ cat /tmp/myfile | jq .
I get:
parse error: Invalid \uXXXX\uXXXX surrogate pair escape at line 1, column 14862268
The offending section is:
$ cut -c 14862258-14862269 /tmp/2017-02-23-2
79-7\ud83d",
So, this escape code was found in a real JSON file and JQ can't read it.
echo '"\ud83d"' | jq .
Fileformat.info seems to suggest that it should come in a pair:
SMILING FACE WITH OPEN MOUTH
"\uD83D\uDE03"
Is this truly an invalid character to find in a JSON file? Is my JSON technically invalid?
Is there a simple utility I can pipe the data through to strip out these characters prior to JQ? Or can I make JQ relax it interpretation?