2

Elixir 1.3.0

Windows 10

Postgrex 0.11.2

Ecto 2.0.1

Postgres 9.4.4

I'm attempting to add records to a PostgreSQL database via Ecto. When I get to a string containing \x0087 it throws the following error:

** (Postgrex.Error) ERROR (character_not_in_repertoire): invalid byte sequence for encoding "UTF8": 0x87

I'm pretty sure it's an issue with the file itself which as far as I can tell is encoded as Latin1. This is the code I use to open the file and read it in:

:ok = :io.setopts(:standard_io, encoding: :latin1)
File.open!(file)
|> IO.binstream(:line)

The file opens fine and in fact several lines are processed just fine until it gets to a line that contains \x0087.

What I can't quite figure out is how to convert the line which is read in with latin1 encoding into UTF-8 encoding. I found String.normalize which seems like it might help with the conversion but I know I'm grasping at straws.

I changed the encoding: parameter on the :io.setopts line to :utf8 but it doesn't seem to make a difference.

Is there some simple way to convert an ANSI/Latin1 encoded string to a UTF-8 encoded string?

Onorio Catenacci
  • 14,928
  • 14
  • 81
  • 132

1 Answers1

0

I'm really hesitant to answer my own question but I think using the techniques found in this Q & A is the right answer here as well. Basically need to convert from CP-1252 to UTF-8 and then everything works as expected.

Community
  • 1
  • 1
Onorio Catenacci
  • 14,928
  • 14
  • 81
  • 132