16

In an SSIS package that I'm writing, I have a CSV file as a source. On the Connection Manager General page, it has 65001 as the Code page (I was testing something). Unicode is not checked.

The columns map to a SQL Server destination table with varchar (among others) columns.

There's an error at the destination: The column "columnname" cannot be processed because more than one code page (65001 and 1252) are specified for it.

My SQL columns have to be varchar, not nvarchar due to other applications that use it.

On the Connection Manager General page I then change the Code page to 1252 (ANSI - Latin I) and OK out, but when I open it again it's back to 65001. It doesn't make a difference if (just for test) I check Unicode or not.

As a note, all this started happening after the CSV file and the SQL table had columns added and removed (users, you know.) Before that, I had no issues whatsoever. Yes, I refreshed the OLE DB destination in the Advanced Editor.

This is SQL Server 2012 and whichever version of BIDS and SSIS come with it.

Hadi
  • 36,233
  • 13
  • 65
  • 124
Amarundo
  • 2,357
  • 15
  • 50
  • 68
  • 2
    Try removing and readding the Flat File Connection Manager – Hadi Jan 26 '18 at 05:35
  • I wanted to avoid that and going and redefining the types of the columns for hundreds of columns and then remapping them to the SQL table. But I'll give it a try. – Amarundo Jan 26 '18 at 13:56
  • I have not done this for this property, but I have done this in other places. Open the dtsx using notepad and edit the property in there. – KeithL Jan 26 '18 at 17:19
  • There's no property with 65001 value in dtsx or commgr files. I started from scratch, a new solution, and still, it forces code page to 65001. Must be something in the data. – Amarundo Jan 26 '18 at 18:46
  • It could be that the file is marked unicode – KeithL Jan 26 '18 at 19:41
  • 1
    The flat file code page is 65001 = Unicode (UTF-8), you cannot change it because the Code Page property in the flat file connection manager is designed to `Specify the code page for non-Unicode text.` check my answer for more details. – Hadi Jan 29 '18 at 09:09

7 Answers7

14

If it is a CSV file column text stream [DT_TEXT] to SQL varchar(max) data type that you want to convert to, change the flat file Connection Manager Editor property Code page to 1252 (ANSI - Latin I).

enter image description here

Hadi
  • 36,233
  • 13
  • 65
  • 124
Ven
  • 157
  • 2
  • 9
9

65001 Code page = Unicode (UTF-8)

Based on this Microsoft article (Flat File Connection Manager):

Code page
Specify the code page for non-Unicode text.

Also

You can configure the Flat File connection manager in the following ways:

Specify the file, locale, and code page to use. The locale is used to interpret locale-sensitive data such as dates, and the code page is used to convert string data to Unicode.

So when the flat file has a Unicode encoding:

Then this property cannot be changed, it will always return to it original encoding.

For more infor about the Code Page identifiers, you can refer to this article:

Community
  • 1
  • 1
Hadi
  • 36,233
  • 13
  • 65
  • 124
  • I'm not sure I follow. Does it mean that always a CSV file will be 65001? Or does it mean that it detects something in the data of this particulate file that forces it to be 65001? There's no "header" or any config data in a CSV file other than the actual data. – Amarundo Jan 30 '18 at 18:25
  • 1
    In any case, what I did is to import the CSV file into a SQL table with nvarchars that I'm using just as a staging table. Then I copy the data to my " real" table. – Amarundo Jan 30 '18 at 18:26
  • @Amarundo i think that staging table is a good idea, also you can use a [script task](https://stackoverflow.com/questions/4351985/converting-unicode-to-windows-1252-for-vcards) to change the file encoding. Also you can use `Data Conversion Transformation` to achieve this within the SSIS. But i think that your choice *(staging table)* is the easiest way. – Hadi Jan 30 '18 at 19:02
  • @Amarundo The text file is saved using an encoding, this encoding can be detected by reading some bytes from the text metadata which is not visible in the user interface. Also there are many topic and article about [Characters Set detection](https://en.wikipedia.org/wiki/Charset_detection) or File Encoding Detection you can search to find more detailed information. Anyway if the file is saved with encoding 65001 or other unicode character set, it cannot be changed from the flat file connection manager. – Hadi Jan 30 '18 at 19:07
  • @Amarundo the simplest way to change a file encoding just open it using notepad, choose to Save As, in the save file dialog, you will see a drop down list to choose the encoding of the file, you can choose other encodings – Hadi Jan 30 '18 at 19:08
  • 1
    Thanks. I never noticed that! In any case, this CSV is being generated by another system and this will be set as SQL job that invokes the SSIS package, so I can't open it and save as. The staging table is my safest bet. Thanks for the info, though. – Amarundo Jan 30 '18 at 19:16
  • @Amarundo its something that i learned while searching for this issue. So thank you – Hadi Jan 30 '18 at 19:18
  • @Amarundo Do you need any other help? – Hadi Jan 30 '18 at 19:18
  • 1
    no. I'm sticking to the stating table solution. Thanks! – Amarundo Feb 06 '18 at 19:17
3

I solved this in SSIS through Derived Column Transformation

Derived Column

Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
  • I tried the Data Conversion Transformation, and while it would allow me to edit the code page on each column, it never *saved* the changes. I looked at the Derived Column Transformation, but this seems to be a column-by-column operation, not bulk. I've got 56 columns coming in an utf-8 encoded csv out of my sql. It's insane that SSIS is a major regression from Import Wizard which would just do the conversions automatically. – user1664043 Oct 10 '21 at 16:59
1

If it's a csv file, you can still use code page 1252 to process it. When you open the flat file connection manager it shows you the code page for the file, but you don't need to save that setting. If you have other changes to make in the connection manager, change the code page back to 1252 before you save the changes. It will process fine if there are no unicode characters in the file.

shonthom
  • 41
  • 4
1

In my case the file was generated in Excel and (mistakenly) saved as CSV UTF-8 (Comma delimited) (*.csv) instead of simply CSV (Comma delimited) (*.csv). Once I saved the file as the correct form of CSV, the code page no longer changed from 1252 (ANSI - Latin I).

1

If you encounter this issue ensure that the DataType for your columns in the Flat File Connection Manager are set to Unicode string [DT_WSTR] and not string [DT_STR]

You can then use the Data Conversion task to convert to DT_STR with code page 1252 and it will retain the settings.

mheptinstall
  • 2,109
  • 3
  • 24
  • 44
0

I was running into a similar challenge, which is how I ended up on this page looking for a solution. I resolved it using a different approach. I opened the csv in Notepad++. One of the menu options is called Encoding. If you select that, it will give you the option to "Convert to ANSI." I knew that my file did not contain any Unicode specific characters. When I went back to the SSIS package, I edited the flat file connection and it automatically changed it to 1252.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Zhovan8
  • 7
  • 2