5

I want my script to be able to take input from stdin where the data might be provided in UTF8 or UTF16 encoding.

something like:

datasource | my-script -e utf8

How do I set the external_encoding of stdin ?

Martin
  • 9,674
  • 5
  • 36
  • 36
  • 1
    you can do `ruby -Eutf8 script.rb` Is this what you asked ? :) – Arup Rakshit Feb 26 '15 at 17:21
  • 1
    This is covered in [the `IO.new` documentation](http://ruby-doc.org/core-2.2.0/IO.html#method-c-new). The information detailed in that method is applicable to other "read" type methods, such as `read`, `gets`, `foreach`. – the Tin Man Feb 26 '15 at 17:46
  • @arup: yes, that could work. But if the script needs to make some decisions before setting the encoding, it will need to follow the suggestion from the Tin Man – Martin Feb 27 '15 at 12:28

1 Answers1

2

In the first line of the script where you define ruby as the interpreter, you may add the --encoding utf-8 parameter in order specify the stdin encoding.

Example:

#!/usr/bin/env ruby --encoding utf-8

text = ARGF.read

From man ruby:

 -E external[:internal]
 --encoding external[:internal]
                Specifies the default value(s) for external encodings and
                internal encoding. Values should be separated with colon
                (:).

                You can omit the one for internal encodings, then the
                value (Encoding.default_internal) will be nil.
fiedl
  • 5,667
  • 4
  • 44
  • 57
  • 1
    This will work great on MacOS but fail on Linux because Linux doesn't handle more than one argument to the shebang. See [this SO answer](http://stackoverflow.com/a/4304187/195369). – Ritchie Dec 20 '16 at 07:49
  • This still works! Solved a problem for me calling a ruby script from 'Karabiner-Elements.app' or 'Automator.app' on 'macOS Mojave 10.14 that was failing because of an 'invalid byte sequence in US-ASCII (ArgumentError)' error. Thanks! – Steph Aug 13 '21 at 23:17