I have a Rails 2.2.2 & Ruby 1.8.6 app which just encountered a weird bug. There's a page which submits a form, and one of the form input values came through in params as "L\001A\001K\0012\0013\0010\0017"
.
It turns out that the value was copied into the text field from a PDF - in the pdf, it looks like "LAK2307", but when it gets copied into the input, "\001"
is inserted between each character. "\001" looks like the utf-8 encoding for the "null char" which is unicode value 1.
I can't prevent people copying this into inputs and submitting them, but i'd like to clean it up before saving to our database. We already convert some fields to ASCII chars before saving, by running the following code on them:
newval = Iconv.iconv('ascii//ignore//translit', 'utf-8', oldval).first
How can i do something similar to this to convert the utf8 chars to a regular char, assuming that's the best way to handle this? In this case i guess i'd just want this to convert "\001"
into ""
, and thus convert "L\001A\001K\0012\0013\0010\0017"
to "LAK2307"
.
thanks, Max
EDIT - changed the name of the question to better describe the problem
EDIT2 - i think that since the problem string is a mix of normal and utf-8 encoded chars, i need to do something like this:
newstring = ""
oldstring.split("").each do |char|
#test if char is a utf8 string encoded like "\001" (or "\153" etc)
if char.is_utf8? #made up method
newstring << char.unencoded #made up method
else
newstring << char
end
end
there's a couple of pseudocode elements - the methods "is_utf8?" and "unencoded" - can anyone fill in the blanks for these?