1

It looks like this question has been asked by a python dev (Allowing input of Unicode escapes as command line arguments), which I think partially relates, but it doesn't fully give me a solution for my immediate problem in Ruby. I'm curious if there is a way to take escaped unicode sequences as command line arguments, assign to a variable, then have the escaped unicode be processed and displayed as normal unicode after the script runs. Basically, I want to be able to choose a unicode number, then have Ruby stick that in a filename and have the actual unicode character displayed.

Here are a few things I've noticed that cause problems:

unicode = ARGV[0] #command line argument is \u263a
puts unicode
puts unicode.inspect
=> u263a
=> "u263a"

The forward slash needed to have the string be treated as a unicode sequence gets stripped. Then, if we try adding another "\" to escape it,

unicode = ARGV[0] #command line argument is \\u263a
puts unicode
puts unicode.inspect
=> \u263a
=> "\\u263a"    

but it still won't be processed properly.

Here's some more relevant code where I'm actually trying to make this happen:

unicode   = ARGV[0]
filetype  = ARGV[1]
path = unicode + "." + filetype

File.new(path, "w")

It seems like this should be pretty simple, but I've searched and searched and cannot find a solution. I should add, I do know that supplying the hard-coded escaped unicode in a string works just fine, like File.new("\u263a.#{filetype}", "w"), but getting it from an argument/variable is what I'm having an issue with. I'm using Ruby 1.9.2.

Community
  • 1
  • 1
cwade
  • 21
  • 4
  • Is this just [this question](http://stackoverflow.com/q/5560914/479863) with the added complication of the shell eating your backslashes? `ActiveSupport::JSON.decode` might be of use in any case. – mu is too short Feb 16 '13 at 04:39
  • Similar, but yeah, the main problem is that the shell is eating my backslashes. – cwade Feb 18 '13 at 17:46
  • Everyone wants to use the backslash as an escape character so sometimes you have to double, triple, quadruple, ... them. Is there any reason you can't pass a UTF-8 string through the arguments? – mu is too short Feb 18 '13 at 18:51
  • Honestly, I'm not sure -- can you give an example? – cwade Feb 18 '13 at 20:03
  • Why not just say `your_script µ`? I guess I don't understand why you're messing around with all this backslash stuff, if you have a filename that contains non-ASCII characters then why would anything care about that? – mu is too short Feb 18 '13 at 20:20
  • I'm just writing this as a basic exercise, and it's been brining me down the dark path of encoding in Ruby, which I suppose isn't necessarily a bad thing--I appreciate your comments thus far. There really isn't a reason I can't just supply the UTF-8 string as the argument, as you suggested, but at this point, I'm just curious to see if the original method is even possible (supplying "\UXXXX" as an argument, or possibly just "UXXXX" and somehow prepend the backslash on there later). – cwade Feb 18 '13 at 20:49
  • It is possible but if you're encoding the string using JSON-ish notation then you have to (a) get the backslashes past the shell (probably by doubling them) and (b) decode it to get the string you really want (probably using `ActiveSupport::JSON.decode` or similar). – mu is too short Feb 18 '13 at 21:11
  • Well, @muistooshort, it appears you actually answered this question before, and your answer for http://stackoverflow.com/questions/7015778/is-this-the-best-way-to-unescape-unicode-escape-sequences-in-ruby is exactly what I was looking for. I used your gsub regex to take the argument and unescape the unicode escape: `gsub(/\\u([\da-fA-F]{4})/) {|m|[$1].pack("H*").unpack("n*").pack("U*")}` Should I create an answer, or would you like to do the honors? – cwade Feb 18 '13 at 23:21
  • I thought it looked familiar but I found [this one](http://stackoverflow.com/q/5560914/479863) when I went looking. I have so many answers that it can be difficult to find the one I'm looking for :) – mu is too short Feb 18 '13 at 23:23

1 Answers1

1

To unescape the unicode escaped command line argument and create a new file with the user supplied unicode string in the filename, I used @mu is too short's method of using pack and unpack, like so:

filetype  = ARGV[1]
unicode   = ARGV[0].gsub(/\\u([\da-fA-F]{4})/) {|m| [$1].pack("H*").unpack("n*").pack("U*")}
path      = unicode + "." + filetype
File.new(path, "w")
Community
  • 1
  • 1
cwade
  • 21
  • 4