0

I have a csv file delimited with pipe(|). I am reading it using the following line of code:

IEnumerable<string[]> lineFields = File.ReadAllLines(FilePath).Select(line => line.Split('|'));

Now, I need to bind this to a GridView. So I am creating a dynamic DataTable as follows:

DataTable dt = new DataTable();
int i = 0;
foreach (string[] order in lineFields)
{
    if (i == 0)
    {                        
        foreach (string column in order)
        {
            DataColumn _Column = new DataColumn();
            _Column.ColumnName = column;
            dt.Columns.Add(_Column);
            i++;
            //Response.Write(column);
            //Response.Write("\t");
        }
    }
    else
    {
        int j = 0;
        DataRow row = dt.NewRow();
        foreach (string value in order)
        {
            row[j] = value;                            
            j++;

            //Response.Write(column);
            //Response.Write("\t");
        }
        dt.Rows.Add(row);
    }
    //Response.Write("\n");
}

This works fine. But I want to know if there is a better way to convert IEnumerable<string[]> to a DataTable. I need to read many CSVs like this, so I think the above code might have performance issues.

JoriO
  • 1,050
  • 6
  • 13
Ankith
  • 145
  • 2
  • 12

2 Answers2

2

Starting from .Net 4:

use ReadLines.

DataTable FileToDataTable(string FilePath)
{

    var dt = new DataTable();

    IEnumerable<string[]> lineFields = File.ReadLines(FilePath).Select(line => line.Split('|'));
    dt.Columns.AddRange(lineFields.First().Select(i => new DataColumn(i)).ToArray());

    foreach (var order in lineFields.Skip(1))
        dt.Rows.Add(order);

    return dt;
}

(edit: instead this code, use the code of @Jodrell answer, This prevents double charging of the Enumerator).

Before .Net 4:

use streaming:

DataTable FileToDataTable1(string FilePath)
{
    var dt = new DataTable();

    using (var st = new StreamReader(FilePath))
    {
        // first line procces
        if (st.Peek() >= 0)
        {
            var order = st.ReadLine().Split('|');
            dt.Columns.AddRange(order.Select(i => new DataColumn(i)).ToArray());
        }

        while (st.Peek() >= 0)
            dt.Rows.Add(st.ReadLine().Split('|'));
    }
    return dt;
}
dovid
  • 6,354
  • 3
  • 33
  • 73
  • if you assume that each row has the same number of columns, you should delete the condition in the loop and instead add code before the start of the loop reads the first line and checks the number of columns of the first row only. – dovid Nov 13 '14 at 10:13
  • If I remove the condition, it works fine. But the header column is also created as a row at the top. – Ankith Nov 13 '14 at 10:18
  • first rows of the file has column headers. – Ankith Nov 13 '14 at 10:20
  • This doesn't add the column headers. – Ankith Nov 13 '14 at 10:28
  • 2
    if you use `ReadLines` the @Jodrell code its slightly more efficient (It creates only one instance of the Enumerator). – dovid Nov 13 '14 at 10:30
1

since, in your linked example, the file has a header row.

const char Delimiter = '|';

var dt = new DataTable;
using (var m = File.ReadLines(filePath).GetEnumerator())
{
    m.MoveNext();
    foreach (var name in m.Current.Split(Delimiter))
    {
        dt.Columns.Add(name);
    }

    while (m.MoveNext())
    {
        dt.Rows.Add(m.Current.Split(Delimiter));
    }
}

This reads the file in one pass.

Jodrell
  • 34,946
  • 5
  • 87
  • 124
  • @jodrell my code `loading it all into memory`? to my knowledge he does not even once. see http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,933 – dovid Nov 13 '14 at 10:34
  • @lomed, yours loads it only into the `DataTable`, thanks to `ReadLines`/`ReadLine`, the OPs, as you noticed, uses `ReadAllLines`. If benchmarked, `ReadAllLines` might be marginally faster for smaller files. However, for large files efficient use of memory is likely to be a greater concern. – Jodrell Nov 13 '14 at 10:38
  • @jordell you right. but if you use ReadAllLines, you can use normal `For`. Still, you need to edit your words, that there was a proposal made by loading the file twice. – dovid Nov 13 '14 at 10:44