Confusion in row and column of DataTable

Question

I am following a tutorial of an inventory stock management system in C# language.

The original csv file is a stock list, which contains four categories:

Item Code, Item Description, Item Count, OnOrder

The original csv file:

In the tutorial, the code is generating a DataTable object, which will be used in the GridView demo in the application.

Here is the code:

DataTable dataTable = new DataTable();
            
dataTable.Columns.Add("Item Code");
dataTable.Columns.Add("Item Description");
dataTable.Columns.Add("Current Count");
dataTable.Columns.Add("On Order");

string CSV_FilePath = "C:/Users/xxxxx/Desktop/stocklist.csv";

StreamReader streamReader = new StreamReader(CSV_FilePath);

string[] rawData = new string[File.ReadAllLines(CSV_FilePath).Length];
rawData = streamReader.ReadLine().Split(',');
            
while(!streamReader.EndOfStream)
{
    rawData = streamReader.ReadLine().Split(',');
    dataTable.Rows.Add(rawData[0], rawData[1], rawData[2], rawData[3]);
}

dataGridView1.DataSource = dataTable;

I am assuming that rawData = streamReader.ReadLine().Split(','); splits the file into an array object like this:

["A0001", "Horse on Wheels","5","No"]
["A0002","Elephant on Wheels","2","No"]

In the while loop, it literates through each line (each array) and assign each of the rawData[x] into corresponding column.

Is this right to understand this code snippet? Thanks in advance.

Another question is, why do I need to run

rawData = streamReader.ReadLine().Split(',');

in a while loop?

Thanks in advance.

ProgrammingLlama · Accepted Answer · 2020-12-13T13:32:40.897

Your code should actually look like this:

DataTable dataTable = new DataTable();

dataTable.Columns.Add("Item Code");
dataTable.Columns.Add("Item Description");
dataTable.Columns.Add("Current Count");
dataTable.Columns.Add("On Order");

string CSV_FilePath = "C:/Users/xxxxx/Desktop/stocklist.csv";

using(StreamReader streamReader = new StreamReader(CSV_FilePath))
{
    // Skip the header row
    streamReader.ReadLine();
    
    while(!streamReader.EndOfStream)
    {
        string[] rawData = streamReader.ReadLine().Split(','); // read a row and split it into cells
        dataTable.Rows.Add(rawData[0], rawData[1], rawData[2], rawData[3]); // add the elements from each cell as a row in the datatable
    }
}

dataGridView1.DataSource = dataTable;

Changes I've made:

We've added a using block around StreamReader to ensure that the file handle is only open for as long as we need to read the file.
We now only read the file once, not twice.
Since we only need the rawData in the scope of the while loop, I've moved it into the loop.

Explaining what's wrong:

The following line reads the entire file, and then counts how many rows are in it. With this information, we initialize an array with as many positions as there are rows in the file. This means for a 500 row file, you can access positions rawData[0], rawData[1], ... rawData[499].

string[] rawData = new string[File.ReadAllLines(CSV_FilePath).Length];

With the next row you discard that array, and instead take the cells from the top of the file (the headers):

rawData = streamReader.ReadLine().Split(',');

This line states "read a single line from the file, and split it by comma". You then assign that result to rawData, replacing its old value. So the reason you need this again in the loop is because you're interested in more than the first row of the file.

Finally, you're looping through each row in the file and replacing rawData with the cells from that row. Finally, you add each row to the DataTable:

rawData = streamReader.ReadLine().Split(',');
dataTable.Rows.Add(rawData[0], rawData[1], rawData[2], rawData[3]);

Note that File.ReadAllLines(...) reads the entire file into memory as an array of strings. You're also using StreamReader to read through the file line-by-line, meaning that you are reading the entire file twice. This is not very efficient and you should avoid this where possible. In this case, we didn't need to do that at all.

Also note that your approach to reading a CSV file is fairly naïve. Depending on the software used to create them, some CSV files have cells that span more than one line in the file, some include quoted sections for text, and sometimes those quoted sections include commas which would throw off your split code. Your code also doesn't deal with the possibility of a file being badly formatted such that a row may have less cells than expected, or that there may be a trailing empty row at the end of the file. Generally it's better to use a dedicated CSV parser such as CsvHelper rather than trying to roll your own.

The amended code makes a lot more sense than the original version — Nan, Dec 13 '20 at 14:39

Confusion in row and column of DataTable

1 Answers1