21

I want to check if the given string to my function is plain text or base64 format. I am able to encode a plain text to base64 format and reverse it. But I could not figure out any way to validate if the string is already base64 format. Can anyone please suggest correct way of doing this in node js? Is there any API for doing this already available in node js.

c g
  • 389
  • 1
  • 4
  • 11

3 Answers3

52

Valid base64 strings are a subset of all plain-text strings. Assuming we have a character string, the question is whether it belongs to that subset. One way is what Basit Anwer suggests. Those libraries require installing libicu though. A more portable way is to use the built-in Buffer:

Buffer.from(str, 'base64')

Unfortunately, this decoding function will not complain about non-Base64 characters. It will just ignore non-base64 characters. So, it alone will not help. But you can try encoding it back to base64 and compare the result with the original string:

Buffer.from(str, 'base64').toString('base64') === str

This check will tell whether str is pure base64 or not.

Raul Santelices
  • 1,030
  • 11
  • 17
  • 1
    My understanding from the question is that the input is a string, and the goal is to tell if it is base64 encoded or not. If the question is really whether the caller has already base64-encoded it or not, then only the caller can unambiguously indicate that (e.g., by passing an extra flag, or replacing the string argument with an object that has a string and a format field, or having two variants of the function -- one for base64 strings and one plain strings). – Raul Santelices Jul 05 '19 at 18:18
4

Encoding is byte level. If you're dealing in strings then all you can do is to guess or keep meta data information with your string to identify

But you can check these libraries out:

  1. https://www.npmjs.com/package/detect-encoding
  2. https://github.com/mooz/node-icu-charset-detector
Basit Anwer
  • 6,742
  • 7
  • 45
  • 88
  • I believe it is because the string could have just base64 characters only. Is that the reason you pointed out to use meta data, such as flag to store if it is already encoded or not? Can you please also point out how these 2 libraries can be helpful in my context. – c g Sep 12 '15 at 06:58
  • The idea was to detect whatever character set the string contains and give you the best result. But since you either have to check if the character was encoded or not then you should use a meta data flag to keep it in check. It will be a simpler solution. – Basit Anwer Sep 14 '15 at 04:56
1

A better approach to use RegExp.

const base64RegExp = /^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$/;
const isBase64 => (str) => base64RegExp.test(str)

Look at the performance tests I've made:

const { performance } = require('perf_hooks');

// Base64 RegExp
const base64RegExp = /^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$/;

// Test strings
const validBase64Str = 'SGVsbG8gV29ybGQ=';
const invalidBase64Str = 'SGVsbG8gV29ybGQ==';

// Test using RegExp
let startTime = performance.now();
for (let i = 0; i < 1000000; i++) {
  base64RegExp.test(validBase64Str);
  base64RegExp.test(invalidBase64Str);
}
let endTime = performance.now();
console.log(`RegExp: ${endTime - startTime}ms`);

// Test using Buffer method
startTime = performance.now();
for (let i = 0; i < 1000000; i++) {
  Buffer.from(validBase64Str, 'base64').toString('base64') === validBase64Str;
  Buffer.from(invalidBase64Str, 'base64').toString('base64') === invalidBase64Str;
}
endTime = performance.now();
console.log(`Buffer: ${endTime - startTime}ms`);

RegExp 3 times faster than Buffer.