Why is ActiveRecord and/or MySQL having a problem with this character?

Question

When I insert certain strings coming in from API calls into my db, they get cut off at certain characters. This is with ruby 1.8.7. I have everything set to utf8 app-wide and in MySQL. I typically don't have any problem entering utf8 content into the DB in other parts of the app.

It's supposed to be "El Soldado y La Muñeca". If I insert it into the db, only this makes it in: "11 El Soldado y La Mu".

>> name
=> "11 El Soldado y La Mu?eca(1).mp3"
>> name[20..20]
=> "u"
>> name[21..21]
=> "\361"
>> name[22..22]
=> "e"

is that a utf8 character?
i know that ruby 1.8 isn't encoding aware, but to be honest i always forget how this should affect me -- i always just set everything at all the other layers to utf8 and everything is fine. WHY THIS NO WORK NOW?

update

CORRECTION-- i was wrong, it's not coming from the api, it's coming from the file system.

the wrongly-encoded character is coming from inside the house!

new question: How can I get utf8 characters from File#path

I am not a Ruby man so this may be a dumb suggestion, but is the *connection* encoding also set to UTF-8? I think it defaults to ISO-8859-1 on every platform — Pekka, Sep 01 '11 at 05:59
see my new question... http://stackoverflow.com/questions/7266815/how-can-i-get-utf8-characters-from-filepath — John Bachir, Sep 01 '11 at 06:38

score 2 · Accepted Answer · answered Sep 01 '11 at 06:30

2

You are somehow getting a Latin-1 (AKA ISO-8859-1) ñ rather than a UTF-8 ñ. In Latin-1 the ñ is 361 in octal (hence your single byte "\361"). In UTF-8 that lower case tilde-n should be \303\261 (i.e. bytes 0303 and 0261 in octal or 0xc3 and 0xb1 in hex).

You might have to start playing with Iconv in the Ruby side to make sure you get everything in UTF-8.

answered Sep 01 '11 at 06:30

mu is too short

426,620
70
833
800

alright, i've been asking the wrong question (again). see update above and this new question: http://stackoverflow.com/questions/7266815/how-can-i-get-utf8-characters-from-filepath – John Bachir Sep 01 '11 at 06:37
@Pekka: And you were on the right track with the ISO-8859-1. Any byte that starts with "3" makes me think of octal, I had to look up the Latin-1 table though. – mu is too short Sep 01 '11 at 06:48

Why is ActiveRecord and/or MySQL having a problem with this character?

1 Answers1