I'm using (and stuck with) the following version of Ruby:
ruby 1.8.7 (2012-06-29 patchlevel 370) [x86_64-linux]
I tried a lot of Googling, but I can't find a working answer to my problem. I'm importing a CSV file that will usually come from the user's Microsoft Excel spreadsheet. I'm having no trouble with the CSV part but I can't figure out how to handle MS "smart" quotes. My input file for my test is in DOS format and contains this line:
Jeanne O�Neill
There's an MS curly apostrophe between the O and N in O'Neill, which shows in my text editor as the "question mark diamond". When I try the following code, the curly apostrophe gets dropped:
# replace Microsoft Office 'smart' quotes
# gem to detect character encoding
require 'rchardet'
if name != nil
cd = CharDet.detect(name)
encoding = cd['encoding']
name = Iconv.conv('UTF-8//TRANSLIT', encoding, name)
end
This yields the undesirable output:
Jeanne ONeill
Is there a way to write a regular expression in Ruby 1.8.7 that will detect the curly MS characters and replace them with straight ones? I've tried using hex codes in my regexes, but I can't make them work. I'm aware that Ruby 1.8.7 is much more limited in handling character encodings that 1.9, but I'm stuck with it. Upgrading Ruby isn't possible right now in this project.
Any help would be appreciated. Thank you.
After reading the post suggested by TinMan, I tried using gsub to replace the resulting '�' sub-string:
if name != nil
name = Iconv.conv("UTF-8", "cp1252//TRANSLIT", name)
name.gsub(/\u00ef\u00bf\u00bd/u, "'")
end
Alas, no love. It still yields the same result :(