74

How to get the string length in bytes in nodejs? If I have a string, like this: äáöü then str.length will return with 4. But how to get that, how many bytes form the string?

starball
  • 20,030
  • 7
  • 43
  • 238
Danny Fox
  • 38,659
  • 28
  • 68
  • 94
  • 3
    A string does not *have* a length in bytes. This depends on the encoding used. – usr Mar 25 '12 at 22:38

6 Answers6

153

Here is an example:

str = 'äáöü';

console.log(str + ": " + str.length + " characters, " +
  Buffer.byteLength(str, 'utf8') + " bytes");

// äáöü: 4 characters, 8 bytes

Buffer.byteLength(string, [encoding])

noraj
  • 3,964
  • 1
  • 30
  • 38
stewe
  • 41,820
  • 13
  • 79
  • 75
12
function getBytes(string){
  return Buffer.byteLength(string, 'utf8')
}
Anthony
  • 13,434
  • 14
  • 60
  • 80
2

Alternatively, you can use TextEncoder

new TextEncoder().encode(str).length

Related question

Assume it's slower though

sad comrade
  • 1,341
  • 19
  • 21
1
console.log(Buffer.from('example..').length)
t33n
  • 171
  • 1
  • 10
1

This depends where the string is.

In JavaScript engines (at least, in most of them, including V8, used by Node.js and Chromium/Chrome), strings are encoded as UTF-16 internally. In UTF-16 encoding, every character is either 2 or 4 bytes long. Every character that's common in any major human language (and many that aren't) are encoded in 2 bytes (one code unit), while characters from rarer languages, emoji, and unusual symbols are often encoded in 4 bytes (two code units).

Moreover, the JavaScript string length property actually does not return the number of characters in the string, it returns the number of code units. For example, ''.length returns 2 even though the string contains only one character.

Finally, the strings are almost certainly (though I have not checked) null-terminated, so throw on an extra 2 bytes for that.

Putting it together, the length of a string residing in your Node.js script's memory is (str.length * 2) + 2 bytes.

On the other hand, when you send a string in an HTTP request, or write it to a file, it will typically be converted by default to UTF-8 before being transmitted to its destination. Characters in UTF-8 can be 1, 2, 3, or 4 bytes long (not counting the phenomenon of "over-long characters" and potential future expansion).

For this, I have nothing to add on top of the other answers to this question, which show how to calculate the length of a string in UTF-8.

Aurast
  • 3,189
  • 15
  • 24
0

If you want to specific encoded, here is iconv example

  var iconv = require('iconv-lite');
  var buf =iconv.encode('äáöü', 'utf8');
  console.log(buf.length);
  // output: 8