3

when I import csv file using Oledb , weird characters:˥«¿ are added to the first column

code:

    string strConnectionString = string.Format("Provider=Microsoft.jet.OLEDB.4.0;Data      
    OleDbConnection SQLConn = new OleDbConnection(strConnectionString);        
    Source={0};Extended Properties='text;HDR=Yes;FMT=Delimited(,)';", strFile);
    OleDbCommand selectCMD = new OleDbCommand("select * FROM ["+strFileName+"]")
    selectCMD.Connection = SQLConn;
    SQLAdapter.SelectCommand = selectCMD;
    SQLAdapter.Fill(dtXLS);

my file start with the word shape without any char before it. I get in the datatable first column name ˥«¿Shape why? Thanks.

Liat
  • 31
  • 1
  • 2
  • The text importer dates from the previous century. You can specify a character set in the schema.ini file but it only supports OEM and ANSI. Utf8 is not supported, much like the jet provider has been unsupported for the past 10 years. You can filter the BOM from the column name but you'll still have a problem with getting garbage for the field data. You can't use this. Consider the .NET TextFieldParser class. – Hans Passant Feb 22 '12 at 12:51

1 Answers1

3

This is an encoding issue - what you are seeing is called the byte order mark (good luck Googling for such a character string!)

Find a way of changing the encoding to UTF-8 and I suspect it'll go away.

Bridge
  • 29,818
  • 9
  • 60
  • 82
  • As you're using a Jet driver (Excel) you might be coming across the bug mentioned here: http://stackoverflow.com/questions/155097/microsoft-excel-mangles-diacritics-in-csv-files – Bridge Feb 22 '12 at 12:46
  • 3
    Thanks a lot,I used schema.ini to solve this problem. [fileName.csv] Format=CSVDelimited ColNameHeader=True CharacterSet=65001 – Liat Feb 26 '12 at 16:05
  • 1
    @Liat How the CharacterSet=65001 solved the problem? What that means? is it UTF8? – RJN Mar 20 '18 at 08:49