3

I have an SSIS package that performs the following.

  1. Run a SQL script
  2. Export the results to a flat file (UTF-8 encoded, ; delimited, and \n for new lines)
  3. FTP results to a Solaris machine (binary format)

The problem is that when the file shows up on my Solaris box, it has the following at the start of the file.

\377\376

I have tried dos2unix, and it still has not corrected the issue. In fact, it changes the \377\376 to \227\226, not very helpful.

Is there a way to remove these characters from my file? When they are there, they mess with grep and other Unix tools, like head.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Freddy
  • 2,249
  • 1
  • 22
  • 31
  • 3
    Thats a utf-16-le BOM (http://en.wikipedia.org/wiki/Byte_order_mark#UTF-16) so poke about in the export options to see if its possible to omit the BOM or use a bom aware viewer on solaris? – Alex K. Sep 17 '12 at 15:35
  • That's a Unicode byte-order mark. Looks like your output is not UTF-8 after all. – tripleee Sep 17 '12 at 15:38
  • 1
    Thanks, I know where to start looking now. I am going to see if changing the output encoding to US-ASCII helps ( i know there are no special chars in my input ). – Freddy Sep 17 '12 at 15:48
  • This is a candidate for a canonical question (for problems with BOMs. Or perhaps only for problems with BOMs UTF-16LE BOMs). There is also *[Compilation error: stray ‘\302’ in program, etc.](https://stackoverflow.com/questions/19198332/)*, but perhaps it should be reserved for Unicode or [Windows-1251](https://en.wikipedia.org/wiki/Windows-1251) characters in the main part of the file. – Peter Mortensen May 20 '23 at 10:25
  • Re *"I have tried dos2unix, and it still has not corrected the issue."*: What parameters did you use for dos2unix? – Peter Mortensen May 20 '23 at 10:55

3 Answers3

3

By default, any SSIS or Windows-encoded file is UCS-2-LITTLE-ENDIAN encoded. The easiest way is to encode the file on your Unix server with the following commands.

  1. Switch over to UTF-8 (or whatever encoding you need) with iconv:

    iconv -f UCS-2-LITTLE-EDIAN -t UTF-8 input > output
    
  2. Remove the carriage returns that Microsoft adds to the end of lines.

    unix2dos -ascii utf-8-file outputfile
    
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Freddy
  • 2,249
  • 1
  • 22
  • 31
0

Dos2unix version 6.0 and higher can convert Windows Unicode UTF-16 files to Unix UTF-8. It will also remove the byte order mark (BOM). Get the latest dos2unix here.

There is a Windows version available.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • That was already mentioned in the question, revision 1: "***I have tried dos2unix***, *and it still has not corrected the issue"*. Though the exact parameters were not revealed. – Peter Mortensen May 20 '23 at 10:51
0

As the previews answers stated, using dos2unix made the job. In my case I used:

dos2unix.exe -r -v -f -D utf8 <FileName>

in which:

-r, --remove-bom remove Byte Order Mark (default)

-v, --verbose verbose operation

-f, --force force conversion of binary files

-D, --display-enc set encoding of displayed text messages encoding ansi, unicode, utf8, default to ansi

And the BOM character was removed.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Marco Vargas
  • 1,232
  • 13
  • 31