7

I have followed this process:

  1. Open notepad and enter some text: "Hello World"
  2. Save the ansi file as: c:\HelloWorld.txt

I then run the following query:

select * from openrowset(bulk 'C:\HelloWorld.txt',single_clob) as test

The text appears in a column called: BulkColumn.

I then do this:

  1. Open notepad and enter some text: "Hello World"
  2. Save the unicode file as: c:\HelloWorld.txt

I then run the following query:

select * from openrowset(bulk N'C:\HelloWorld.txt',single_nclob) as test

The error I get is:

SINGLE_NCLOB requires a UNICODE (widechar) input file. The file specified is not Unicode.

Why is this?

w0051977
  • 15,099
  • 32
  • 152
  • 329
  • I think you need to save it as UTF-8 – cha Jan 27 '16 at 22:26
  • *"The file specified is not Unicode."* – TT. Jan 27 '16 at 22:54
  • @TT, is that a question? – w0051977 Jan 27 '16 at 23:00
  • Your question: *"Why is this?"*, the answer is in the error message: *"The file specified is not Unicode"* – TT. Jan 27 '16 at 23:01
  • @TT, why is it not Unicode? I select Unicode as the file type in notepad. – w0051977 Jan 27 '16 at 23:44
  • If I save a file in Notepad as Unicode (UTF-16 LE encoding) my SQL Server (2012) happily accepts the file with `BULK ... SINGLE_NCLOB`. Perhaps this has something to do with collation of the database/server in question? What does `SELECT SERVERPROPERTY('Collation')` return on your system? – TT. Jan 28 '16 at 07:57
  • @TT. I would be surprised if the server or even DB collation had anything to do with it. I am guessing it is most likely user-error in terms of saving the file. – Solomon Rutzky Jan 28 '16 at 08:05

2 Answers2

10

You need to double-check how you saved the "Unicode" file. In Windows / .NET / SQL Server, the term "Unicode" refers specifically to "UTF-16 Little Endian (LE)". When dealing with UTF-16 Big Endian (BE), it will be referred to as "Unicode Big Endian" or "Big Endian Unicode". UTF-8 is always UTF-8.

I created a file in Notepad and went to "Save As" and selected "Unicode" from the "Encoding" drop-down and it worked just fine with the statement you are using:

SELECT *
FROM OPENROWSET(BULK N'C:\temp\OPENROWSET_BULK_NCLOB-test.txt', SINGLE_NCLOB) AS [Test];

If I re-saved it with any other encoding, I got the error message you are seeing.

I also used Notepad++ and in the "Encoding" menu selected "Encode in UCS-2 Little Endian". UCS-2 and UTF-16 are identical for Code Points U+0000 through U+FFFF and there is no UTF-16 option in Notepad++ so this was the closest thing. And yep, it also worked.

So somehow you did not actually save the file as "Unicode". If you selected "Unicode big endian" in Notepad, that is not "Unicode" in terms of how Windows is using that term, even if it is a valid Unicode encoding.

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
  • That is correct. I reinstalled Notepad and UTF-8 appeared. This works. Why does it have to be UTF-8 instead of UTF-16? +1. – w0051977 Jan 28 '16 at 08:33
  • @w0051977 reintalled Notepad? What you are describing is the opposite of what I am saying. I am saying that UTF-8 is _not_ considered "Unicode" (in name only) by Windows / .NET / SQL Server. SQL Server does not support UTF-8 internally on any level, and even removed the only support for it (code page 65001 in BCP, I believe) in SQL Server 2012 (I believe). What version of SQL Server are you on? – Solomon Rutzky Jan 28 '16 at 15:53
0

UTF-8 is not considered "Unicode" by Windows / .NET / SQL Server. On Notepad++, you should save with encoding "Unicode Big Endian" or "Big Endian Unicode" (UCS - 2 BE)/(UCS - 2 LE)