1

PHP has a lot of trouble with multibyte strings (non-ASCII characters). The entire language was built assuming that each character is a byte. To solve this they invented the mb_strings functions which you can use instead of the standard functions (which work fine).

strlen($str);
mb_strlen($str); // correct

However, this is really a pain since you have to verify that the code you download/find online uses these functions or enable the mb_string_overload which then might break some code that actually needs char = byte calculations.

Does Ruby share this problem?

Xeoncross
  • 55,620
  • 80
  • 262
  • 364

3 Answers3

5

It shares the problem. It's covered here at SO. You can use ActiveSupport::Multibyte for mb_chars support.

>> s =  "Iñtërnâtiônàlizætiøn"
=> "Iñtërnâtiônàlizætiøn"
>> puts s[0..3]
Iñt
=> nil
>> puts s.mb_chars[0..3]
Iñtë
=> nil
>> puts s.mb_chars.size
20
=> nil
>> puts s.size
27
=> nil
Community
  • 1
  • 1
Chandra Patni
  • 17,347
  • 10
  • 55
  • 65
1
irb(main):002:0> 'ÿ'.length
=> 2
TML
  • 12,813
  • 3
  • 38
  • 45
1

I think Ruby 1.9 clears up this underlaying assumption

RyanWilcox
  • 13,890
  • 1
  • 36
  • 60