I'm trying to decode the JSON you get from Facebook when you download your data. I'm using Node JS. The data has lots of weird unicode escapes that don't really make sense. Example:
"messages": [
{
"sender_name": "Emily Chadwick",
"timestamp_ms": 1480314292125,
"content": "So sorry that was in my pocket \u00f0\u009f\u0098\u0082\u00f0\u009f\u0098\u0082\u00f0\u009f\u0098\u0082",
"type": "Generic"
}
]
Which should decode as So sorry that was in my pocket
. Using fs.readFileSync(filename, "utf8")
gets me So sorry that was in my pocket ððð
instead, which is mojibake.
This question mentions that it's screwed up latin1
encoding, and that you can encode to latin1
and then decode to utf8
. I tried to do that with:
import iconv from 'iconv-lite';
function readFileSync_fixed(filename) {
var content = fs.readFileSync(filename, "binary");
return iconv.decode(iconv.encode(content, "latin1"), "utf-8")
}
console.log(JSON.parse(readFileSync_fixed(filename)))
But I still get the mojibake version. Can anyone point me in the right direction? I'm unfamiliar with how iconv works in regard to this.