1

In Node.js: Why does this test fail on the second call of main?

test('base64Encode and back', () => {
  function main(input: string) {
    const base64string = base64Encode(input);
    const text = base64Decode(base64string);
    expect(input).toEqual(text);
  }

  main('demo');
  main('');
});

Here are my functions:

export function base64Encode(text: string): string {
  const buffer = Buffer.from(text, 'binary');
  return buffer.toString('base64');
}

export function base64Decode(base64EncodedString: string): string {
  const buffer = Buffer.from(base64EncodedString, 'base64');
  return buffer.toString('binary');
}

From these pages, I figured I had written these functions correctly so that one would reverse the other:

If I change the 'binary' options to be 'utf8'instead, the test passes.

But my database currently has data where this function only seems to work if I use 'binary'.

Ryan
  • 22,332
  • 31
  • 176
  • 357
  • 1
    Strings default to utf8 in node. When the utf8 string is converted to the [`latin1`/`binary` char set](https://en.wikipedia.org/wiki/ISO/IEC_8859-1) it can't represent the multibyte utf8 character any more. – Matt Feb 11 '23 at 00:26
  • But there might be more to the question then, What's the purpose of the additional conversion before/after base64? Something related to whats stored in the database? – Matt Feb 11 '23 at 00:26
  • @Matt You're asking about `Buffer.from(text, 'binary')` and `buffer.toString('binary')`? I need a `string` instead of `Buffer` type. – Ryan Feb 12 '23 at 19:32
  • Yeah, The `base64` encoding is enough to give you a portable string. I was just checking if there was a specific reason for using the answer from that question that did the additional conversion to `latin`/`binary` ? – Matt Feb 13 '23 at 04:18
  • @Matt If you write your first comment as an answer, I'll accept it. So far, I haven't noticed any problems using `'binary'` (other than my test failing), so maybe my data doesn't have any multibyte characters yet. And I guess if I do need to support emojis etc, I'll need to go back and decode the DB and re-encode using `'utf8'`. Thanks. – Ryan Feb 13 '23 at 15:20

1 Answers1

3

binary is an alias for latin1

'latin1': Latin-1 stands for ISO-8859-1. This character encoding only supports the Unicode characters from U+0000 to U+00FF. Each character is encoded using a single byte. Characters that do not fit into that range are truncated and will be mapped to characters in that range.

This character set is unable to display multibyte utf8 characters.

To get utf8 multibyte characters back, go directly to base64 and back again

function base64Encode(str) {
  return Buffer.from(str).toString('base64')
}
function base64Decode(str) {
  return Buffer.from(str, 'base64').toString()
}
> base64Encode('')
'8J+YiQ=='
> base64Decode('8J+YiQ==')
''
Matt
  • 68,711
  • 7
  • 155
  • 158