4

Is there a standard governing Unicode font support expected of all browsers?

The latest version of Unicode contains a repertoire of more than 110,000 characters covering 100 scripts. I don't expect the browsers to support all of them, but there should be minimum support for some characters such as letters from the Latin script, common punctuation, and symbols of type math, currency, and other.

I am currently having problem displaying the U+060B AFGANI SIGN (؋) and U+202F NARROW NON-BREAK SPACE on the Android browser. I wonder if there is a list of universally recognized Unicode characters so that developers can use them confidently without having to worry about browser display issues.

tchrist
  • 78,834
  • 30
  • 123
  • 180
Question Overflow
  • 10,925
  • 18
  • 72
  • 110
  • 1
    That seems more like a font question than a browser issue. – melpomene Dec 15 '12 at 12:35
  • @melpomene, I understand that such browser display issues can be resolved by downloading and installing the necessary fonts. But is there a way to know whether a specific Unicode character will be supported by default? – Question Overflow Dec 15 '12 at 13:06
  • No OS vendor will ever commit to this. You'll have to test it. – Hans Passant Dec 15 '12 at 13:54
  • Isn't Unicode the standard governing how much Unicode support is expected? Browsers, like any other application dealing with text, are expected to handle Unicode, period. – jalf Dec 15 '12 at 14:59
  • @jalf, I understand, I just don't know how to phrase my question properly due to my lack of knowledge. Perhaps you can try to help me by editing the question? Or should I replace the word browser with OS? – Question Overflow Dec 15 '12 at 15:05
  • Well, my point is that every browser (every application, really) strives to handle Unicode as well as possible. There is no particular subset that can be safely relied on. But the set of glyphs supported really depends on the chosen font, not on the browser. – jalf Dec 15 '12 at 15:15
  • This question is misworded. I’ve tried to fix it up a bit, but it still needs improvedment. The issue confuses font support with Unicode support, which are incredibly different. If a browser breaks the line at a narrow non-break space, ***then and only then*** does it not support that character — because it is doing something forbidden by the Unicode linebreak properties. On the other hand, if the user’s font does not happen to have a glyph for that code point, but the browser correctly understands it to forbid a linebreak, then the browser indeed supports it, and only the font is deficient. – tchrist Dec 15 '12 at 17:06
  • 1
    Similarly, if the browser treats a currency symbol codepoint as a currency symbol rather than as, say for example, a letter or whitespace, then the the browser can be said to support that codepoint. Just because some particular font has a graphic representation for that codepoint is frankly immaterial to whether the browser supports it — that is, understands how this Unicode codepoint is supposed to behave under this or that version of the Unicode Standard. Properties can and sometimes do in fact change between releases, so this is a legitimate concern. – tchrist Dec 15 '12 at 17:09
  • @tchrist, thanks for the helpful edit. To a layman like me, font is just something related to style. I post this question because I had the expectation that character display should be independent of font. I accepted Jukka's answer because his article made me understand that things don't work that way. I fully agree with what you wrote in the two comments above. But then again, should we accept things as it is now, or should there be a push to adopt a "universal" font containing those basic scripts that I highlighted in my question? Thanks :) – Question Overflow Dec 16 '12 at 03:11

2 Answers2

3

There is no standard on Unicode support in browsers. Besides, the ability to display a character mostly depends on fonts, though browsers differ in their abilities in scanning through fonts. Normally what you can do is to specify a suitable font-family list of fonts that each support all the characters you need. For generalities on this, see my Guide to using special characters in HTML.

On Android, the problem is that there is a very limited set of fonts. If you need any characters beyond what is supported by them, you need to use a downloadable font, via @font-face.

The currency symbol “؋” U+060B AFGHANI SIGN is present in about a dozen fonts, but the only free font among them (if we don’t count the bitmap font GNU Unifont) appears to be Scheherazade.

For U+202F NARROW NO-BREAK SPACE, font support is wider. But in general, it is often better to use other methods than such characters. Many fonts contain this character as almost as wide as a normal space, and its description in the Unicode standard as regards to its width is vague: “a narrow form of a no-break space, typically the width of a thin space or a mid space”. “Thin space” is described as “a fifth of an em (or sometimes a sixth)” in the Unicode standard, and in reality its width varies. And “mid space” is really an undefined concept.

For example, if the text is in a language that uses spaces as thousands separators, you could in principle write a number like 100 000 as 100 000, but it’s better to write, say,

<span class="gr">100&nbsp;000</span>

with CSS code like .gr { word-spacing: -0.15em }.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
  • Awesome knowledge. I would seriously consider your suggestion on using CSS for the narrow no break space. Wouldn't it be great if someone can come up with a font that covers all the general and miscellaneous stuff and put it up for sharing on Google Web Fonts? – Question Overflow Dec 15 '12 at 14:48
  • I had mistakenly written a piece of HTML code without code markdown, so part of the idea was missed; now corrected. – Jukka K. Korpela Dec 15 '12 at 14:57
  • Maybe I am wrong, but I feel that the entire question is misleadlingly misworded. Font support for a particular code point is quite different from whether the browser understands that a particular code point is a letter or whitespace or whatnot. So whether a browser "supports" narrow non-break space, meaning that it cannot trigger a linebreak, is really a very different question from whether the font contains a representation for it. – tchrist Dec 15 '12 at 16:57
1

AFAIK, all browsers support @font-face for loading webfonts and can support any character within those fonts. As such, you should be able to display any character in any browser if you make sure you provide access to a webfont with support for those characters.

To avoid using giant fonts just to support a few special characters, you can create your own fonts with tools like the Icomoon App.

I used the Icomoon App to create the Emoji emoticon font as well as for creating custom icon fonts on a per project basis.

For more info on the use or creation of icon fonts (or other webfonts), see Create webfont with Unicode Supplementary Multilingual Plane symbols

Community
  • 1
  • 1
John Slegers
  • 45,213
  • 22
  • 199
  • 169