2

I'm trying to parse a CSV to DataTable in a VB.NET project, the issue is that when i'm importing the CSV file, the columns where the values are float like ;2,08;2;0,82 are imported as 30/12/1899 02:08:00 and 30/12/1899 02:00:00

How could i prevent the oledb on formatting the data?

Here is how i'm loading the csv to datatable:

        Dim con As New OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & path & ";Extended Properties=""text;HDR=YES;FMT=Delimited(;)""")

        con.Open()
        Dim da As New OleDbDataAdapter(sqlquery, con)
        da.Fill(dt)

And in the same folder where the CSV is i have the schema.ini with the following content:

[anar_adm.csv]
Format=Delimited(;)
NiceToMytyuk
  • 3,644
  • 3
  • 39
  • 100
  • What culture is it that uses commas as decimal separator? I think I have a solution for you, but I want to test it first. I'm guessing Italy, since that's where you're from :) – MarkPflug Jan 05 '21 at 17:15
  • 1
    @MarkPflug Hi Mark, yeah you're right it's italian :) – NiceToMytyuk Jan 05 '21 at 17:15

2 Answers2

2

I maintain a library that can handle this scenario: Sylvan.Data.Csv

This is the code that I wrote to verify that it works as expected. Sorry, this is in C# as I don't know VB very well, should be straightforward to translate.

using var reader = new StringReader("Name;Value1;Value2\nTest;2,08;0,82\n");

var schema = Schema.Parse("Name,Value1:float,Value2:float");

var options = new CsvDataReaderOptions {
    Schema = new CsvSchema(schema),
    // you would likely replace this with CurrentCulture
    // the default is InvariantCulture
    Culture = CultureInfo.GetCultureInfoByIetfLanguageTag("it-IT")
};
var csv = CsvDataReader.Create(reader, options);
var dt = new DataTable();
dt.Load(csv);

The CsvDataReader comes from Sylvan.Data.Csv and the Schema.Parse line comes from Sylvan.Data, which is pre-release. You don't have to use the Schema.Parse, instead you can provide your own ICsvSchemaProvider. There is an example of a simple schema provider in this Answer.

The advantage that this has over OleDB is that it is entirely managed, no external driver needed, and doesn't require the schema.ini. It is also much, much faster, if performance is a concern (in fact, it's the fastest for .NET). The disadvantage is that you need to provide the schema yourself, while OleDB will try to automatically detect it from the data (and sometimes gets it wrong).

MarkPflug
  • 28,292
  • 8
  • 46
  • 54
0

The Schema.ini file has different options, you can read about it here.
Schema.ini File (Text File Driver)

You can specify, among other things, the Column Data Type and also the decimal separator. For example, a schema that defines 4 Columns where:

  • ColNameHeader=True: The data file contains the Columns definition
  • Format=Delimited(;): The Columns are separated by a semicolon
  • DecimalSymbol=,: The decimal separator is a comma
  • Col(N)=ColumnName TYPE [WIDTH] [#]: the Columns' Name and Type and, optionally, the width of the Column. The width value ([#]) is mandatory.

If the Columns may contain a currency symbol that is not the System default's, specify the Currency format. For example, using the Euro Symbol:

  • CurrencySymbol=[Symbol]: the Currency symbol used in the data file.

  • CurrencyPosFormat=[N]: the Symbol position for positive values. This can be one of these 4 values:
    0 = €1,1 1 = 1,1€ 2 = € 1,1 3 = 1,1 €

  • CurrencyNegFormat=[N]: same as above, but many more positions can be specified. See the previously linked documentation for a list of possible values

  • CurrencyDigits=[N]: the number of decimal digits used for Currency

  • CurrencyThousandSymbol=[Symbol]: the Thousands separator Symbol

  • CurrencyDecimalSymbol=[Symbol]: the Decimal separator Symbol

The CharacterSet specifies what Encoding the CSV file is using. Standard values are:

  • ANSI (local CodePage), OEM, Unicode. Other Encodings must be specified using their CodePage. For example, UTF-8 needs to be specified as a CodePage 65001.
    All available CodePages in a System are listed in this Registry Key:
    HKEY_CLASSES_ROOT\MIME\Database\Charset

With the this CSV content:

ID;Description;Quantity;FloatValue;Price
1;String 1;100;10,56;€10,79
2;String 2;101;12,72;€200,34
3;String 3;33;100657,72;€0,98

A Schema.ini file can be as follows:

[filename.csv]
ColNameHeader=True                First row is the Columns Header
MaxScanRows=0                     No auto-detect Column Type
CharacterSet=65001                UTF-8 file Format
Format=Delimited(;)               Columns delimiter
DecimalSymbol=,                   Decimals separator
NumberLeadingZeros=1              Below 1 values leading zeros (0,12)
CurrencySymbol=€                  Currency symbol (if present)
CurrencyPosFormat=0               Currency positive format: €1,1
CurrencyNegFormat=1               Currency negative format: -€1,1
CurrencyDigits=2                  Number of Decimal digits
CurrencyThousandSymbol=.          Thousands separator
CurrencyDecimalSymbol=,           Decimal separator
Col1=ID Long                      |-- Columns definition
Col2=Description Text WIDTH 50    |
Col3=Quantity Integer             |
Col4=FloatValue Double            |
Col5=Price Currency               |

[otherFilename.csv]
ColNameHeader=True
MaxScanRows=25                   | Auto-detect the data type, using 25 rows as sample
CharacterSet=ANSI
Format=Delimited()               | Standard delimiter: (,)

In any case, I suggest to get a library as CsvHelper

Jimi
  • 29,621
  • 8
  • 43
  • 61
  • That doesn't solve my issue the values that should be double are still Date, i actually solved by using the answer from the [following](https://stackoverflow.com/questions/30201592/convert-csv-data-to-datatable-in-vb-net-capturing-column-names-from-row-0) question but i'd still wait for a OleDB solution.. i also would avoid by setting the columns datatypes in .ini as it will change for other csv – NiceToMytyuk Jan 05 '21 at 17:28
  • This is the standard `Schema.ini` for CSV formats. The `DecimalSymbol` is a conversion specifier, it always works when the Column's data type is consistent and applied correctly (and, btw, I've just tested it, with the same Provider). There's no OleDb solution and, as you can see, the `Schema.ini` is relative to a specific CSV file. If you have issues with this, get a specialized library as CsvHelper (added a note). – Jimi Jan 05 '21 at 17:36