0

I have a string, that can contain unicode letters. I want to find out, how "wide" will it be (in letters, not pixels) when it is written on screen in a monospace font.

It might not look so trivial as it seems. Consider this code:

var la = "لا"
console.log(la.length); // prints 2
console.log(la.split);  // [ 'ل', 'ا' ]

In REPL.

While لا has the width of 1 letter in almost all monospace fonts, it's actually 2 letters - ل and ا (ignore the writing direction issues, that's a separate thing :) )

Is it possible to find the "visual width" (or how to call it exactly) in Javascript?

In my example, I want to have a function, to which I enter لا and the result will be 1, not 2 as in .length.

(Sorry if it's too confusing, I just don't know how to express myself right.)

All the things I can find is finding a width in pixels. I want to know the width in "monospace letters". How much letters will this string take in monospace?

If it isn't possible, then it's all right.

edit: I found out that it's much easier to do what I wanted to do with css and text-overflow: ellipsis. However, I will keep this question here, maybe it will be helpful for someone else.

Karel Bílek
  • 36,467
  • 31
  • 94
  • 149
  • What about having the text inside a span and detecting the span's width? – Bwaxxlo Jul 03 '15 at 14:24
  • Try this: http://stackoverflow.com/a/118251/4772988 – suvroc Jul 03 '15 at 14:24
  • I wrote it wrong, I don't want the width in pixels, but number of letters. – Karel Bílek Jul 03 '15 at 14:24
  • 2
    possible duplicate of [How can I tell if a string contains multibyte characters in Javascript?](http://stackoverflow.com/questions/4877326/how-can-i-tell-if-a-string-contains-multibyte-characters-in-javascript) – CodingIntrigue Jul 03 '15 at 14:27
  • @RGraham that's not correct too. `č`, `ř`, `š` are multibyte characters, but are handled correctly with `.length` and `split("")`, as far as I can tell. edit: OK, I read the question, and while it's similar, it doesn't actually answers my question. – Karel Bílek Jul 03 '15 at 14:28
  • 1
    @KarelBílek You're right, the concept is in that question but it doesn't actually answer this particular question - vote removed. – CodingIntrigue Jul 03 '15 at 14:36
  • But it's actually a very similar problem. :) – Karel Bílek Jul 03 '15 at 14:37
  • @KarelBílek you're right, surrogate pairs are another problem (which also makes `.length` unreliable in JavaScript), but your `لا` "letter" is actually two characters, according to unicode (I can even type between them here: `ل-ا`). This is like [combining marks](https://en.wikipedia.org/wiki/Combining_character#Unicode_ranges), but not actually. – pozs Jul 03 '15 at 14:39
  • 1
    In ES6, you can get the real length of a string with characters beyond 16 bits using `[...str].length`. For example, `'\uD83D\uDE80'.length` is 2, but `[...'\uD83D\uDE80'].length` is 1. However, you have two letters: `ل` and `ا`. I don't know Arabic so not sure if it makes sense to consider them a single character, but in unicode they are different characters. – Oriol Jul 03 '15 at 14:40
  • @Oriol if you look at how they are represented even in the example above, they have the width of 1 letter in the monospace font. – Karel Bílek Jul 03 '15 at 14:46

2 Answers2

2

You can create a canvas with a monospace font. Then with measureText you can measure the width of your string and the width of a single letter, and divide them.

var countLetters = (function() {
  var ctx = document.createElement('canvas').getContext('2d');
  ctx.font = '48px monospace';
  var letterWidth = ctx.measureText('a').width;
  return function (str) {
    return Math.round(ctx.measureText(str).width / letterWidth);
  };
})();

I think Math.round should not be necessary, but it will guarantee the result to be an integer.

Oriol
  • 274,082
  • 63
  • 437
  • 513
  • You still probably should `Math.round` the answer as even in `monospace` it's not guaranteed to be an _int_ result. – Paul S. Jul 03 '15 at 15:00
  • @PaulS. I also thought so, but `TextMetrics.width` returns the number of pixels in the canvas that the text would occupy. And I think canvas don't have subpixels, so that value must be an integer. – Oriol Jul 03 '15 at 15:09
  • `countLetters(la); // 1.0914742451154529` – Paul S. Jul 03 '15 at 15:09
  • 1
    @PaulS. Weird, I get `1`. Well, adding `Math.round` won't hurt. – Oriol Jul 03 '15 at 15:12
  • I realized my question is actually pretty stupid, since there are unicode letters that are not 1 even in monospace (as I now found out). So this doesn't really make sense anyway. Unicode is hard! – Karel Bílek Jul 10 '15 at 08:52
1

One kinda gross solution is to put the string into a div to get the pixel width and then divide by the pixel width of a single character. It would suck for processing performance, but should be accurate since each character in a fixed-width font is supposed to be the same amount of horizontal space.

Ed Ballot
  • 3,405
  • 1
  • 17
  • 24