0

I have sample data that looks like this:

 1  This is a random line in the file
 2  
 3  SOURCE_ID|NAME|START_DATE|END_DATE|VALUE_1|VALUE_2
 4
 5  Another random line in the file
 6  
 7  
 8  
 9  
10  GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|DEF
11  ALEF|ABC|2019-FEB-01|2019-AUG-31|FBC|DGF
12  GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|TEF
13  FLBER|RED|2019-JUN-01|2019-JUL-31|AJC|DEH
14  GI|JOE|2020-APR-01|2020-DEC-31|GBC|DER

I am unable to save changes to the file. Ie, I can't manipulate/clean the original files before consumption. Any manipulation will need to be done on the fly in memory. But what if the files are large (eg, I am currently testing with some files that are 5m+ records).

I am using CsvHelper

I have already referred to the following threads for guidance:

CSVHelper to skip record before header

Better way to skip extraneous lines at the start?

How to read a header from a specific line with CsvHelper?

What I would like to do is:

  • Set row where header is = 3 (I will know where the header is)
  • Set row where data starts = 10 (I will know where the data starts from)
  • Load data into data table, to be displayed into datagridview

If I need perform a combination of stream manipulation before I pass this into the CsvHelper, then do also let me know if that's the missing piece? (and any assistance on how I can actually achieve that under one block of code with be greatly appreciated)

So far I have come up with the below:

string filepath = Path.Combine(txtTst04_File_Location.Text, txtTst04_File_Name.Text);

using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{

    // skip rows to get the header
    for (int i = 0; i < 4; i++) 
    {
        csv.Read();
    }

    csv.Configuration.Delimiter = "|"; // Set delimiter
    csv.Configuration.IgnoreBlankLines = false;
    csv.Configuration.HasHeaderRecord = true;
    
    // how do I set the row where the actual data starts? 

    using (var dr = new CsvDataReader(csv))
    {
        var dt = new DataTable();
        dt.Load(dr);
        dgvTst04_View.DataSource = dt; // Set datagridview source to datatable
    }

}

I get the below result:

DataGridViewResult

Do let me know if you would like me to expand on any point.

thanks!

EDIT:

New linked post created here trying to resolve the same objective, but in a different way but getting a new error: Filestream and datagridview memory issue with CsvHelper

v-c0de
  • 142
  • 1
  • 3
  • 15

2 Answers2

1

I can get it to work with ShouldSkipRecord. The only problem is it will fail if any of the random lines has a "|" delimiter in it.

using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
    csv.Configuration.Delimiter = "|"; // Set delimiter
    csv.Configuration.ShouldSkipRecord = row => row.Length == 1;
    
    using (var dr = new CsvDataReader(csv))
    {
        var dt = new DataTable();
        dt.Load(dr);
        dgvTst04_View.DataSource = dt; // Set datagridview source to datatable
    }

}

If you know how many columns there are, you could set it to skip any rows that have less than that many columns.

csv.Configuration.ShouldSkipRecord = row => row.Length < 6;
David Specht
  • 7,784
  • 1
  • 22
  • 30
  • Thanks @David. I can also replicate your results and it does work! I won't know the number of columns, it will be dynamic and can change. (It's the user who also enters the header row and data row, the program should do the rest). However, my concern is around if any delimiter characters are used in rows apart from header and data. Is there no other way around it, especially since we will know exactly the rows to use for header and data? many thanks for your help. – v-c0de Oct 13 '20 at 06:19
  • I've marked David's response as the answer (since it does technically work), but I have added a new post as per Kiril's comment (i have added a link in my original post) – v-c0de Oct 13 '20 at 11:31
0

I came up with another approach that allows you to skip the lines to the header and then to the records.

using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
    csv.Configuration.Delimiter = "|"; // Set delimiter
    csv.Configuration.IgnoreBlankLines = false;

    // skip to header
    for (int i = 0; i < 3; i++)
    {
        csv.Read();
    }

    csv.ReadHeader();

    var headers = csv.Context.HeaderRecord;

    // skip to records
    for (int i = 0; i < 6; i++)
    {
        csv.Read();
    }

    var dt = new DataTable();

    foreach (var header in headers)
    {
        dt.Columns.Add(header);
    }

    while (csv.Read())
    {
        var row = dt.NewRow();
        for (int i = 0; i < headers.Length; i++)
        {
            row[i] = csv.GetField(i);
        }

        dt.Rows.Add(row);
    }
}
David Specht
  • 7,784
  • 1
  • 22
  • 30