1

I'm filling a table in MySQL from a CSV file using an application developed with C# and ASP.NET MVC4. I have an issue loading CSV files.

I'm reading the CSV file, filling a DataTable, doing validations about regular expressions and inserting finally in MySql database. Great.

The problem is with the data formats in the CSV file. Looks like when I read the Csv file, c # does not Distinguish between strings and double.

When I load just numbers (doubles) with separator, "." or "," the method does not recognize the contents of cvs file.

I've reviewed formatting in regional settings and language and don't get a solution.

Let me explain please,

VIEW

From the view Load I've some like.

 //Here I can choose a Csv File and print the content
 @model Pharmtech.Models.myModule.FilesModel
 <div>
 @using (Html.BeginForm("UploadFile", "myModule", FormMethod.Post,
 new { enctype = "multipart/form-data" }))
 {
  <h3>Choose a Csv File</h3>
  <input type="file" name="file" />
  <input type="submit" value="Cargar">
 }
 <p>@TempData["Messages"]</p>

CONTROLLER

When the submit active the ActionResult UploadFile

    [HttpPost]
    public ActionResult UploadFile(HttpPostedFileBase file)
    {
        FilesModel model = new FilesModel();
        DataTable dt;

       //Check if file is not null
        if (file != null)
        {
            //Built the path
            string path = DateTime.Now.ToShortDateString().Replace("/", "-") 
                          + (file.FileName).ToLower();

            //Check File Extension
            if (model.FileCsv(path))
            {                    
                file.SaveAs(Server.MapPath("~/App_Data/" + path));
                dt = model.getDataTableFromCsvFile(Server.MapPath("~/App_Data/"), path);

                //Check if dt it's not empty
                if (dt.Rows.Count > 0)
                {
                    //Save in TempData["Messages"] the dt content
                    if (model.printDataTableFromCsv(dt))
                    {
                        TempData["Messages"] = model.AllData;
                    }
                }
                else TempData["Messages"] = "Csv File is Empty";
            }else TempData["Messages"] = "File it's not Csv File";
        }else TempData["Messages"] = "Please upload a Csv File";

       return RedirectToAction("Load");
    }    

MODEL

I've written some methods in FilesModel Model

Validating Csv File Extension

    public bool FileCsv(string fileName)
    {
        string ext = Path.GetExtension(fileName);
        switch (ext.ToLower())
        {
            case ".csv": return true;
            default: return false;
        }
    }

Get a DataTable from CSV File

//Get DataTable from Csv File    
    public  DataTable getDataTableFromCsvFile(String path, String file)
    {
        //Quit double spaces
        System.IO.File.WriteAllText(path + file, Regex.Replace
               (System.IO.File.ReadAllText(path + file), "  ", " "));

        //Defining connection
        string connectionStringCsv = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
                path + ";Extended Properties='text;HDR=Yes;FMT=Delimited'";
        DataTable dt = new DataTable();

        try
        {
            OleDbConnection objConn = new OleDbConnection(connectionStringCsv);
            objConn.Open();

            OleDbCommand objCmdSelect 
              = new OleDbCommand("select * from [" + file + "]", objConn);
             OleDbDataAdapter objAdapter = new OleDbDataAdapter();

            objAdapter.SelectCommand = objCmdSelect;
            objAdapter.Fill(dt);
            objConn.Close();
        }
        catch (Exception ex)
        {
            //Do something
        }

        return dt;
    }

Finally print the DataTable Content

 public string AllData;
 public bool printDataTableFromCsv(DataTable masterCsv)
 {
     AllData = "";
     int row = 0;
     //Check if DataTable it's not emtpy
     if (masterCsv.Rows.Count > 0)
     {
         //Going through DataTable and print rows
         foreach (DataRow rows in masterCsv.Rows)
         {
           AllData += "Row: " + row 
             + "=" + rows["DATA"].ToString() + ". ";
           row++;
                }
         }            
          if (row != 0) { return true; }
          else { return false; }
}

INPUT and OUTPUTS

Some examples:

Output 1

Input 1, when the files content

 //DATA
 //30.33
 //20.45

 //Output is 1         
 //I get: "Row: 0=3033. Row: 1=2045" 
  //**Is to say, remove the "." 

Output 2

Input 2, when the files content

 //DATA
 //20.33
 //Hello World

 //Output is
 //Row: 0=2033.Row: 1=.
 //**Is to say, remove the letters

Output 3

Input 3, when the files content

 //DATA
 //20.33
 //Hello World
 //This is, a problem

 //Output is        
 //Row: 0=20.33.Row: 1=Hello World.Row: 2=This is, a problem.

Hypothesis

I guess the error is in the reading of csv, C # tries to assign to the column of the DataTable a type of data

Sometimes, I choose a csv file with just double the print is "File Empty" I tried with .csv and .csv MS-DOS

Somebody have any idea that what I'm doing wrong? Thanks for your help

diegobarriosdev
  • 409
  • 1
  • 6
  • 20
  • Can you tell us which row is the place of the exception? – Lajos Arpad May 19 '17 at 20:35
  • 1
    this question has too much going on. Can you isolate a single row that is causing the problem and if that problem is in the reading of the csv, processing it into a datatable, or saving it to MySql? – chris-crush-code May 19 '17 at 20:36
  • I get the error just printing the DataTable. I quit all validations and operations about database. – diegobarriosdev May 19 '17 at 20:38
  • so you get the error on AllData[row] = "Row: " + row + " -> Field DATA " + "= " + rows["DATA"].ToString(); ? What's the error message? – chris-crush-code May 19 '17 at 20:42
  • {System.InvalidCastException: Can not convert an object of type 'System.String' to type 'System.String []'. The program breaks when is showing the content of TempData["Messages"] – diegobarriosdev May 19 '17 at 20:45
  • Is rows["DATA"] a String array? – Lajos Arpad May 19 '17 at 21:00
  • AllData is a string array rows is the varible that go through the for each, "DATA" is the column name on CSV File – diegobarriosdev May 19 '17 at 21:07
  • What is the type of rows["DATA"]? Is AllData[row] = "Row: " + row + " -> Field DATA " + "= " + rows["DATA"].ToString(); the row you have the problem at? – Lajos Arpad May 19 '17 at 21:28
  • Yes, it is. The problem in the original project is that remove "." from a column . I do this test and find that when the column has "." or "," the program break. AllData is a string array, rows is an int. rows["DATA"] is sometimes int or somethings string, or mixed – diegobarriosdev May 19 '17 at 21:40

2 Answers2

1

I guess the error is in the reading of csv, C # tries to assign to the column of the DataTable a type of data

If your columns have mixed data types, or contain data which may be read as different data types(ie "4-4-97" which can be both a date or a string) you should use the imex switch in your connection string.

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + someVariablePath + ";Extended Properties=\"text;HDR=Yes;FMT=Delimited;IMEX=1;\"";

Or use a schema file and gain more control over how your data is read. Note that if you switch to the newer Data Access Components (Provider=Microsoft.ACE.OLEDB.12.0), The IMEX switch has no effect and you must use a schema file if you wish to control data type.

Schema file Documentation

[myfile.txt]
Format=CSVDelimited
ColNameHeader=True
MaxScanRows=5

It's not that C# tries to define the type, its the data driver that makes that decision and tells your code what is coming. The driver reads the first n lines and defines a type. Then passed n lines it passes a value that no longer matches the type.

col1
1
2
3
4
5
a
7
true
9

The column above could create a problem. Based on the first 5 records, we're dealing with int. So when the column definitions are made the column is deemed to be an int. From then on, your code is expecting to see numbers in that column. But then on record 6, there's a string. And record 8, could be a string or a bool. The data driver doesn't care but, your code does when you feed it something like "a" when it's expecting a number.

blaze_125
  • 2,262
  • 1
  • 9
  • 19
  • Thanks friend, it's a clear explication of what's going on. I tested IMEX=1 in connection string and don't work. I'm not sure on create the schema, because this method must work to twenty seven (27) diferents Csv Files (diferents names files, diferents headers, I need built the file name by today date), but I'll try it. Thanks so very much. – diegobarriosdev May 23 '17 at 13:00
0

A partial solution is quote the CSV values inside double quotation marks (This way, JET will treat all the read values like string) and then parse them from C #.

After several attempts of solution and the evaluation of the time of consumption in the load, I decided not to use Jet.OLEDB

I think there are more appropriate ways of doing this, using LINQ for example.

I also found these two projects:

A fast CSV Reader by Sebastian Lorien

A portable and Efficient parse for float files

Delimited Text Extension File

Let me share this answer, helped me find a solution:

How to read a CSV file into a .NET Datatable

Thanks very much

diegobarriosdev
  • 409
  • 1
  • 6
  • 20