7

Apparently Safari normalizes Unicode when sending POST data, while all other major browsers just send what they're given.

Normalization appears to be happening right before the data is sent over the wire, and using normalize() on the data doesn't work (Safari enforces NFC, regardless of what it's given).

This becomes a problem when requesting a filename with an accented character, which has different code points in NFC and NFD formats. The explanation essentially comes down to "combining characters" vs. "precomposed characters" in Unicode equivalence).

With that said, given an API that doesn't do its own normalization on the backend, and is expecting an array of strings (filenames), is it possible to send over the correct filename on the frontend when using Safari?

An example of the Unicode normalization problem:

const str = 'Rosé'

const nfc = str.normalize()
const nfd = str.normalize('NFD')

console.log(nfc === nfd) // false

console.log(nfc.codePointAt(3)) // 233
console.log(nfd.codePointAt(3)) // 101

console.log(nfc.codePointAt(4)) // undefined
console.log(nfd.codePointAt(4)) // 769

A minimal, reproducible example:

Note the console log differences between Chrome and Safari.

const isCorrectForm = (path, form) => path === path.normalize(`NF${form}`)

const fetchData = async() => {
  const sourcePathC = '\u00e9'; // "é"
  const sourcePathD = '\u0065\u0301'; // "é"

  await fetch('https://httpbin.org/post', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        pathsFormC: [sourcePathC],
        pathsFormD: [sourcePathD]
      }),
    })
    .then((response) => response.json())
    .then((data) => {
      const responseData = JSON.parse(data.data);
      const responsePathC = responseData.pathsFormC[0];
      const responsePathD = responseData.pathsFormD[0];

      console.log({
        isSourcePathCFormC: isCorrectForm(sourcePathC, 'C'),
        isSourcePathDFormD: isCorrectForm(sourcePathD, 'D'),
        isResponsePathCFormC: isCorrectForm(responsePathC, 'C'),
        isResponsePathDFormD: isCorrectForm(responsePathD, 'D'),
      });
    });
}

fetchData();

Update:

This is impossible to solve on the client-side alone. Either backend needs to handle normalization on the receiving end, or, as answered, the client needs to send the data encoded via encodeURIComponent, in conjunction with backend implementing decoding of that data.

jabacchetta
  • 45,013
  • 9
  • 63
  • 75
  • Héhé seems I got trapped by this myself while editing the question: I first tried from Firefox (my main browser) and then tried from Safari to see if it was indeed reproducible, and posted from that Safari... – Kaiido Sep 10 '20 at 11:30
  • 5
    Duplicate of https://stackoverflow.com/questions/11176603/how-to-avoid-browsers-unicode-normalization-when-submitting-a-form-with-unicode – Kaiido Sep 10 '20 at 11:46
  • Perhaps this simple change does the work, but can't test it on Safari though. https://jsfiddle.net/ez1u37qg/ –  Sep 16 '20 at 15:08

1 Answers1

1

Perhaps an easy solution is this one. I say perhaps because this is not tested on Safari (unfortunately).

const isCorrectForm = (path, form) => path === path.normalize(`NF${form}`)

const fetchData = async() => {
  const sourcePathC = encodeURIComponent('\u00e9'); // "é"
  const sourcePathD = encodeURIComponent('\u0065\u0301'); // "é"

  console.log(sourcePathC, sourcePathD);

  await fetch('https://httpbin.org/post', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        pathsFormC: [sourcePathC],
        pathsFormD: [sourcePathD]
      }),
    })
    .then((response) => response.json())
    .then((data) => {
      const responseData = JSON.parse(data.data);
      const responsePathC = decodeURIComponent(responseData.pathsFormC[0]);
      const responsePathD = decodeURIComponent(responseData.pathsFormD[0]);

      console.log(responsePathC, responsePathD);

      console.log({
        isSourcePathCFormC: isCorrectForm(sourcePathC, 'C'),
        isSourcePathDFormD: isCorrectForm(sourcePathD, 'D'),
        isResponsePathCFormC: isCorrectForm(responsePathC, 'C'),
        isResponsePathDFormD: isCorrectForm(responsePathD, 'D'),
      });
    });
}

fetchData();
  • Encoding will work, but it requires access to changing the backend. – jabacchetta Sep 17 '20 at 16:11
  • @jabacchetta yes indeed, this way back-end changes are needed. –  Sep 17 '20 at 17:57
  • @jabacchetta if this is a built-in feature of Safari i am not sure if you are able to bypass it only from the client-side. What troubles me is this you write `Normalization appears to be happening right before the data is sent over the wire...`. If it happens after `Javascript` execution then is it possible to bypass this client-side? –  Sep 17 '20 at 18:19
  • 1
    Yeah, I've come to the conclusion that it's impossible to do client-side only. And, in fact, encoding is the suggestion I made to the backend team. I don't believe we're going to get an answer here, so I'll go ahead and award you the bounty. – jabacchetta Sep 17 '20 at 20:09