3

I want to clean special characters from English or Arabic Strings. For example, the "–" in the below example is a special character that displays a "?" when converted to UTF-8.

File name: 1-Mechanical Drawings – Part 1 should be like 1-Mechanical Drawings Part 1.

السلطات العراقية تعلنé should be السلطات العراقية تعلن, where é is special character and should be removed from the string.

Nick Kugaevsky
  • 2,935
  • 1
  • 18
  • 22
kashif
  • 1,097
  • 4
  • 17
  • 32
  • Could you be more specific about how you're defining a special character? What makes `é` and `—` special? – georgebrock Sep 10 '12 at 07:38
  • sure. I need to clean the file names before uploading. I'm using transliterate_file_name of the paper-clip. its working fine. the problem is when I run the following gsub to clean the file name. it cleans well but it doesn't clean the special characters. "1-Mechanical Drawings – Part 1 should be like 1-Mechanical Drawings Part 1".squeeze(" ").gsub(' ', '_').gsub(/\W/,'').downcase my app is supporting file names in different languages – kashif Sep 10 '12 at 09:52
  • Please check http://stackoverflow.com/questions/1268289/how-to-get-rid-of-non-ascii-characters-in-ruby – Arunmohan PK Sep 10 '12 at 11:28
  • I tried the above mentioned approach. with special characters, it also cleans the Arabic characters as well. – kashif Sep 11 '12 at 05:22
  • also, currently, we are using ruby 1.8.7 – kashif Sep 11 '12 at 05:24

1 Answers1

0

This is reinventing the wheel somewhat, but you can do something like this to get the output you say you want in the question:

def clean_file name
  result = File.basename(name,".*")
  result.gsub!(/[é–]\s?/,'')
  result
end

Replacing the bit within [] with the characters you feel are not appropriate in file names like é etc. But beware two things:

  • Using a blacklist approach as above may mean you leave characters you don't want - it's more normal to use a whitelist approach like \W to catch all non-word characters, that works in ruby 1.9 at least, but perhaps is what causes you problems on 1.8
  • Leaving spaces in the names could cause you issues, so you should probably remove " " and downcase at least.
Kenny Grant
  • 9,360
  • 2
  • 33
  • 47