Nodejs: How to preserve "\u00a0" in JSON string during a merge with other JSON

Question

I have to merge 2 json files and need to preserve a string in one of the files:

"weeks": "weeks\u00a0days"

The \u00a0 after the merge always change to space: "weeks": "weeks days".

I want it to stay as: "weeks": "weeks\u00a0days"

some code:

//merge.js
const fs = require('fs');
const test1 = fs.readFileSync('./test1.json', 'utf8');
const test2 = fs.readFileSync('./test2.json', 'utf8');

const merge = (obj1, obj2) =>
  JSON.stringify({ ...JSON.parse(obj1), ...JSON.parse(obj2) }, null, 4);

const saveFile = (fileName, obj1, obj2) => {
  fs.writeFile(`${__dirname}/${fileName}`, merge(obj1, obj2), err => {
    if (err) throw err;
    console.log(`The file ${fileName} has been saved!`);
  });
};

saveFile('testFinal.json', test1, test2);

test1.json

{
  "link": {
    "about": "About",
    "version": "version"
  },
  "items": {
    "siteId": "Site ID",
    "siteName": "Site name",
    "siteType": "Site type",
    "weeks": "weeks\u00a0days"
  }
}

test2.json

{
  "features": {
    "activateFeatures": "Activate features",
    "confirmation": "Confirmation",
    "hardware": "Hardware",
    "existingHardware": "Existing hardware",
    "emailLicense": "Email license",
    "downloadLicense": "Select quantity"
  }
}

please help

why not just call `const abc = require('./abc.json')` instead of using `readFileSync()`? It can save you the `JSON.parse()` call and all the trouble caused by extra parsing — William Chong, Sep 04 '19 at 08:04
Your problem is not with `merge` specifically, it's a general flaw in `JSON.stringify`, see https://stackoverflow.com/questions/31649362/json-stringify-and-unicode-characters , https://stackoverflow.com/questions/12271547/shouldnt-json-stringify-escape-unicode-characters/27252001 — georg, Sep 04 '19 at 08:10
@georg - It's not a flaw not to bloat the resulting string for no reason. — T.J. Crowder, Sep 04 '19 at 08:11
@T.J.Crowder: they could have added an option for this, cf. python `json.dump` — georg, Sep 04 '19 at 08:49
@georg - TC39 often prefers minimalism, particularly when an option serves basically no purpose and is easily provided in userland. The only reason for enabling escapes for everything is if you aren't handling encoding correctly, which will **always** bite you in the end. — T.J. Crowder, Sep 04 '19 at 09:05
@T.J.Crowder: well, they do have an option for "indent", which is even less essential. `ensure_ascii` is vital when you're about to send json over the wire and cannot be sure every middleman handles utf8 correctly. — georg, Sep 04 '19 at 09:09
@georg - I have to disagree, formatting is routinely necessary in development, and very difficult to implement after-the-fact. If there's a problem with middlemen mangling your stream, you have a problem with broken middlemen, not `JSON.stringify`, which (again) **will** bite you in other ways. :-) — T.J. Crowder, Sep 04 '19 at 09:13

score 2 · Answer 1 · answered Sep 04 '19 at 08:00

The \u00a0 after the merge always changne to space: "weeks": "weeks days".

I think you'll find that it converts to a hard space, not a space. Example:

const json = '{"foo": "Testing one\\u00a0two\\u00a0three"}';
console.log(json);
const parsed = JSON.parse(json);
console.log(parsed.foo.includes("\u00a0")); // true
const json2 = JSON.stringify(parsed);
console.log(json2);
console.log(json2.includes("\u00a0")); // true

That's perfectly valid JSON and means exactly the same thing as \u00a0. If you really need hard spaces to be written as Unicode escapes when your JSON serializer doesn't do that, you'll need to post-process the string, e.g.:

const result = JSON.stringify(stuff).replace(/\u00a0/g, "\\u00a0");

score 1 · Answer 2 · answered Sep 04 '19 at 08:08

By default JSON.stringify performs preserving original characters in string and perform transformations only for elements which should be escaped, like double-quote or backslash, to ensure correct syntax of stringified entity.

If you want escape custom ranges of Unicode characters in target string, use following transform construction. Of course, you can adjust Unicode characters range in regexp, which should be encoded as \uXXXX. See character categories here: http://www.unicode.org/notes/tn36/Categories.txt

const jsonUtfStringify = inputObject => JSON.stringify(inputObject).replace(
  /[\0-\x1F\x7F-\x9F\xAD\u0378-\uFFFF]/g,
  match => `\\u${(+match.codePointAt(0)).toString(16).padStart(4, '0')}`
)

Nodejs: How to preserve "\u00a0" in JSON string during a merge with other JSON

2 Answers2