0

Possible Duplicate:
How to remove high-ASCII characters from string like ®, ©, ™ in Java

How do I remove rectangle-like unicode characters in Java String?

Community
  • 1
  • 1
user569125
  • 1,423
  • 13
  • 29
  • 40
  • 2
    The fault, dear user, lies not in the characters, but in ourselves if we are naive with them. (He said, *mangling* Shakespeare.) Also: http://www.joelonsoftware.com/articles/Unicode.html – T.J. Crowder May 23 '11 at 11:18

3 Answers3

2

When your characters are rendered as rectangles, than that usually means that your system doesn't have the necessary fonts to display.

Since the installed fonts can vary from machine to machine, it's hard to define what you mean by "rectangle like unicode characters".

If your code is running on the machine that does the display (i.e. you're not just rendering HTML, for example), then you might be able to use Font.canDisplay() or Font.canDisplayUpTo() to check if a given char/String can be displayed.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
1

How to remove rectangle like unicode characters in java string.

They aren't bad character !. They don't have proper font to be displayed just.

Still if you want you can only accept characters from specified range of your choice . Or its better to provide font

jmj
  • 237,923
  • 42
  • 401
  • 438
  • @Jigar Joshi: What font i have to load to display these characters.,or is possible to delete these characters. – user569125 May 23 '11 at 11:16
  • Better to go for right font. selection of font again depends on the character range. but [Arial_Unicode_MS](http://en.wikipedia.org/wiki/Arial_Unicode_MS) supports all character. – jmj May 23 '11 at 11:17
  • @Jigar: it supports all characters... for some values of "all" ;-) – Joachim Sauer May 23 '11 at 11:27
  • @Joachim as far as I know it renders all unicode characters. correct me If i am wrong. – jmj May 23 '11 at 11:29
  • 2
    @Jigar: Unicode has a **huge** range of characters and it's growing. Many of those code points are of some obscure scripts that are only historically relevant or written by only a few individuals. **No single font** supports all of them. Examples: [U+133FA EGYPTIAN HIEROGLYPH Z015](http://www.fileformat.info/info/unicode/char/133fa/index.htm), [U+1207 ETHIOPIC SYLLABLE HOA](http://www.fileformat.info/info/unicode/char/1207/index.htm), [U+10800 CYPRIOT SYLLABLE A](http://www.fileformat.info/info/unicode/char/10800/index.htm), ... – Joachim Sauer May 23 '11 at 11:36
  • 2
    @Jigar: to be more specific: "It covers all code points containing non-control characters in Unicode 2.1.". We're currently in Unicode 6.0. Between those versions **70497** new characters were introduced. – Joachim Sauer May 23 '11 at 11:39
  • @Joachim yes. just came to know about it. thanks – jmj May 23 '11 at 11:41
0

I would start by looking at the Apache Commons Lang StringUtils.escapeHtml() function code (JavaDoc here: http://commons.apache.org/lang/apidocs/index.html) and see how they do the escaping -- and instead of escaping the char just remove it.

Liv
  • 6,006
  • 1
  • 22
  • 29
  • That won't help, if the problem lies in a missing font (as I strongly suspect). – Joachim Sauer May 23 '11 at 11:41
  • I beg to differ -- regardless of the font missing (and I tend to agree here with you!), if you eliminate those characters (which is what he is trying to do) then you won't need to print them and as such there won't be any "empty squares" -- wouldn't you say? – Liv May 23 '11 at 12:56
  • That's what I get for not reading the answer fully. Of course, you're right (the downvote is not mine, btw). – Joachim Sauer May 23 '11 at 13:00
  • that's ok -- i don't care for the voting as I care for correct answers :) – Liv May 23 '11 at 14:04