10

I discovered that in the Mac OS X Terminal, some Unicode characters take up more than one character space. For example 27FC (long rightwards arrow from bar). It prints two characters wide, but the second character prints on top of whatever the next character is, so you have to do ⟼<space> for it to print correctly. For example, ⟼a prints like. Arrow + a (I made the font size large so that you could see it, but it does it for all font sizes).

By the way, this is the Menlo font in the Mac OS X 10.6 Terminal application.

23B3 (SUMMATION TOP) actually prints as two characters wide and tall (at least in Safari, it does this in the browser too, notice how it overlaps with the above line)⎲

However, in the terminal in Ubuntu, none of these characters print wider or taller than one character.

Is there a way to programmatically tell if a character takes up more than one space?

I'm using Python, so something that works either in pure Python or on POSIX (i.e., I can call some bash command using the os module) would be preferred.

Also, I should note that if I increase the "Character Spacing" setting in the font settings of the terminal to 1.5 (from the default 1.0), then it looks like Arrow + a spaced.

Also, it'd be nice if an answer could give some insight into all of this (i.e., why does it happen?)

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
asmeurer
  • 86,894
  • 26
  • 169
  • 240

3 Answers3

7

While it's not relevant for the specific examples you give (all of which display at the size of a single character for me on Ubuntu), CJK characters have a unicode property which indicates that they are wider than normal, and display at double width in some terminals.

For example, in python:

# 'a' is a normal (narrow) character
# '愛' can be interpreted as a double-width (wide) character
import unicodedata
assert unicodedata.east_asian_width('a') == 'N'
assert unicodedata.east_asian_width('愛') == 'W'

Apart from this, I don't think there's a specification for how much space certain characters should take up, other than the size of the glyph in whatever font you are using (which your terminal is probably ignoring for the reason Ignacio gave).

For more info on the "east asian width" property, see http://www.unicode.org/reports/tr11/

mesilliac
  • 1,474
  • 11
  • 8
  • Interesting. Both of those characters I gave return 'N'. But I see for example in your code block that 愛 is two characters wide. In Terminal, 愛 prints as two characters wide but doesn't overlap with other characters. So I suspect that Mechanical snail is right on this one. – asmeurer Aug 17 '11 at 18:08
4

No, since there's no way to tell what font the terminal is using. Always use a monospace font, lesson learned.

It happens because the terminal is using a "cell" font layout engine (i.e. characters are printed at specific X and Y coordinates regardless of their actual size) whereas the browser is using a "flow" font layout engine (subsequent characters print where the previous character ended).

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 2
    there are also unicode characters that behave differently from typical characters, like characters that add accents to previous characters, or characters that flip the formatting from left-to-right to right-to-left. – Darren Yin Aug 17 '11 at 02:01
  • 1
    I think you misunderstand the problem. I *am* using a monospaced font. The terminal prints certain unicode characters larger than one cell, though. It does it exactly the same no matter what font I use (monospace or not). – asmeurer Aug 17 '11 at 02:21
  • @Darren Yin: Interesting. Are there characters that flip arbitrary characters, or only for certain ones? – asmeurer Aug 17 '11 at 02:22
  • 1
    @asmeuerer: It's possible that your monospace font doesn't provide the glyph in question. Then the OS/text rendering engine needs to fall back to another font. *Usually* it should fall back to another monospace font with similar metrics, but if no such fonts supports the glyph (as may be the case with rarely used Unicode codepoints), it's entirely possible that it falls back to a non-monospace font for that one glyph. – Joachim Sauer Aug 17 '11 at 05:31
  • @Joachim Sauer: Yes, I know. This happens all the time. But usually, Terminal.app still squeezes the glyph into one cell. This happens for characters that it doesn't happen for in other places (such as
     environments in Safari).  One example problematic character is ⅈ, which is not in Menlo and the replacement is always not the same width in 
     environments, but it fits just fine in Terminal.app
    – asmeurer Aug 17 '11 at 18:04
1

This is a bug in the OS X terminal.

I wouldn't recommend trying to work around it, because it will break on other systems (e.g. Linux), and it might get fixed eventually on the Mac. It will also confuse anyone that pastes into another applicaton.

Mechanical snail
  • 29,755
  • 14
  • 88
  • 113