4

In Java I create a string that uses unicode and overline because I am trying to display square roots of numbers. I need to know the length of the string for some formatting issues. When using the combining characters in unicode the usual methods for finding string length seem to fail as seen by the following example. Can anyone help me find the length of the second string when random numbers are in the square root, or tips on how to do the square root display better?

    String s = "\u221A"+"12";
    String t = "\u221A"+"1"+"\u0305"+"2"+"\u0305";
    System.out.println(s);
    System.out.println(t);
    System.out.println(s.length());
    System.out.println(t.length());

Thanks for any help, I couldn't find anything on this using google.

cafman
  • 307
  • 2
  • 5
  • 15
  • Where is your output going to be displayed? On a GUI via Swing? On a terminal via System.out.println? Elsewhere? – ninjalj Oct 09 '11 at 15:11
  • This will be displayed in a panel using Swing. The panel size will change depending on the length of the string. – cafman Oct 09 '11 at 15:23
  • Swing should have some mechanism to get a predicted size for a to-be-rendered String. – ninjalj Oct 09 '11 at 15:24
  • See also: http://stackoverflow.com/questions/6232464/how-to-determine-the-length-of-a-graphic-string and http://stackoverflow.com/questions/258486/calculate-the-display-width-of-a-string-in-java – ninjalj Oct 09 '11 at 15:37
  • Thanks, I've already implemented fontmetrics and am using monospace so with the answer given below I think I can move to the next stage of my problem. Thanks for the help. – cafman Oct 09 '11 at 16:22

1 Answers1

7

the usual methods for finding string length seem to fail

They don't fail, the report the string lenght as number of Unicode characters [*]. If you need another behaviour, you need to define clearly what you mean by "string length".

When you are interested in string lengths for displaying purposes, then usually your are interested in counting pixels (or some other logical/physical unit), and that's responsability of the display layer (to begin with, you might have different widths for different characters, if the font is not monospaced).

But if you're just interested in counting the number of graphemes ("a minimally distinctive unit of writing in the context of a particular writing system"), here's a nice guide with code and examples. Copying-trimming-pasting the relevant code from there, we'd have something like this:

  public static int getGraphemeCount(String text) {
      int graphemeCount = 0;
      BreakIterator graphemeCounter = BreakIterator.getCharacterInstance();
      graphemeCounter.setText(text);
      while (graphemeCounter.next() != BreakIterator.DONE) 
          graphemeCount++;
      return graphemeCount;
  }

Bear in mind: the above uses the default locale. A more flexible and robust method would, eg, receive an explicit locale as argument and invoke BreakIterator.getCharacterInstance(locale) instead

[*] To be precise, as pointed out in comments, String.length() counts Java characters, which are are actually code-units in a UTF-16 encoding. This is equivalent to counting Unicode characters only if we are inside the BMP.

leonbloy
  • 73,180
  • 20
  • 142
  • 190
  • Since he is using Swing, he probably wants pixels. – ninjalj Oct 09 '11 at 15:31
  • Also, `String.length()` returns the number of code-units, not Unicode characters, which should equal code-points, save for Unicode non-characters. – ninjalj Oct 09 '11 at 15:34
  • I've been reading online and looks like you pointed me in the right direction. graphemes appears to be what I want. The link helped. Thanks. – cafman Oct 09 '11 at 16:20
  • @ninjalj : You're right about objecting about my description of String.length(), I actually intended it as an first-approximation description (in the context of the question it seemed enough) but I added a clarification. – leonbloy Oct 09 '11 at 16:41
  • @user986446: You're welcome. As you seem new here, be sure to read this http://stackoverflow.com/faq#howtoask – leonbloy Oct 09 '11 at 17:25