1

Im trying to determine the physical pixel width of a string.

for example:

FONT_SIZE = 10
str="123456789"
width = str.length * FONT_SIZE  # which will be 9 * 10 = 90px

PROBLEM: But for chinese, japanese or korean:

FONT_SIZE = 10
str="一二三四五六七八九"
width = str.length * FONT_SIZE  # this still result in 90 (9*10)

But it really should be 180 as they are 2 chars with for each char.

How do I make this function (returns true/false)?

def is_wide_char char
  #how to?
end

class String
  def wlength
    l = 0
    self.each{|c| is_wide_char(c)? l+=2: l+=1}
    l
  end
end
Makoto
  • 104,088
  • 27
  • 192
  • 230
c2h2
  • 11,911
  • 13
  • 48
  • 60
  • 1
    Are you really willing to assume that your users are always using fixed-width fonts? – sarnold Feb 21 '11 at 09:49
  • Have a look at http://stackoverflow.com/questions/4681055/how-can-i-detect-cjk-characters-in-a-string-in-ruby – steenslag Feb 21 '11 at 10:06
  • awesome. solved. about the fonts I use RMagick to draw them with fixed font width. So each char will be in a consistent width. – c2h2 Feb 21 '11 at 10:10
  • Also remember that length return the number of chars. Use bytesize if you need the number of bytes – Luis Feb 21 '11 at 11:51

2 Answers2

1

How can I detect CJK characters in a string in Ruby? gives the answer

class String
  def contains_cjk?
    !!(self =~ /\p{Han}|\p{Katakana}|\p{Hiragana}\p{Hangul}/)
  end
end

strings= ['日本', '광고 프로그램', '艾弗森将退出篮坛', 'Watashi ha bakana gaijin desu.']
strings.each{|s| puts s.contains_cjk?}

#true
#true
#true
#false
Community
  • 1
  • 1
c2h2
  • 11,911
  • 13
  • 48
  • 60
0

Experts from unicode.org have already made a table to distinguish wideness of each character for you. You should refer to UAX #11 and it's data file.

By seeing the data file, you would know it is easy to parse, however, if you prefer to use a gem, there is east_asian_width_simple. There are other gems too but east_asian_width_simple is faster and more flexable.

Usage

require 'east_asian_width_simple'
eaw = EastAsianWidthSimple.new(File.open('EastAsianWidth.txt'))
eaw.string_width('台灣 No.1') # => 9
eaw.string_width('No code, no ') # => 14

Wide character and full-width character are different by definitions in UAX #11 but based on your description, I think the following code would be the closest implementation of what you want to achieve:

require 'east_asian_width_simple'
$eaw = EastAsianWidthSimple.new(File.open('EastAsianWidth.txt'))

def is_wide_char(char)
  case $eaw.lookup(char.ord)
  when :F, :W then true
  else false
  end
end
Weihang Jian
  • 7,826
  • 4
  • 44
  • 55