1

I have a string 'aa\b\u0007\u0007'

var a = 'aa\b\u0007\u0007';
console.log(a); 
//=> a //+ 2 beeps
console.log(a.length); 
//=> 5

Here a.length simply gives me 5, but the outputted string is just a and its length just 1.

How to get that?

laggingreflex
  • 32,948
  • 35
  • 141
  • 196

3 Answers3

1

There are a couple of different issues here.

First, different environments will render that string differently. Some will render the bell character as an actual glyph; others, like traditional consoles, will make a sound instead. Some will render (some) zero-width characters as various glyphs as well. There is no one "this is how long this string is once you account for backspaces and zero-width characters" interpretation.

You'll need to determine the rules you want to apply in your situation. The Unicode site may help with some traditional interpretations. Or if you're just interested in interpreting old-fashioned ASCII, that will be a lot easier, but of course we don't live in an ASCII world anymore (which is a Good Thing(tm)).

Once you have your rules, depending on how complex they are, you may be able to apply them with one or more regular expressions. For instance, this simplistic regular expression will treat a backspace as meaning it should remove the previous character, and remove all other characters whose character code is less than 32 (traditionally, "control characters"). Again, this is not complete, there are plenty of Unicode zero-width characters outside that realm (there are various zero-width spaces for a start). and doing a thorough job of it across the Unicode range will be a project, not a trivial function.

But just for example:

function getInterpretedLength(s) {
  return s.replace(/(?:.[\b])|[\u0000-\u001f]/g, "").length;
}

The second issue is that for certain Unicode code points (loosely, "characters"), JavaScript counts two JavaScript characters, not one. That's because JavaScript strings are a 16-bit encoding like UTF-16, except that they tolerate invalid surrogate pairs, and some characters are encoded with two 16-bit values, not just one.

So this will either be a large project, or if you can constrain it sufficiently based on what you're actually trying to solve, it may be a bit smaller.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
0

Looking at this answer, you could try to strip non-printable characters using replace before getting the length, like this:

console.log(a.replace(/[^\x20-\x7E]+/g, '').length);
Community
  • 1
  • 1
Alasjo
  • 1,240
  • 8
  • 17
0

You can actually count characters with canvas, but there is no real backspace character in web that act like in terminal. So, you have to manually calculate substract it for backspaces.

var text = 'aa\b\u0007\u0007';
var context = document.createElement('canvas').getContext("2d");
context.font="30px Courier New";
var length = context.measureText(text).width / context.measureText('x').width - text.match(/\x08/g).length;

alert(length);
//1
YOU
  • 120,166
  • 34
  • 186
  • 219