0

Im loading and splitting couple csv files into two lists in c#. Now I also need to extract the header from the first line with the ; as delmiter. I'm trying to use the .Skip(1) command but that only skips (obviously) but I need to extract the header and after my work with the rest of the data is done add it again as the first line.

Here is what I have tried so far:

string[] fileNames = Directory.GetFiles(@"read\", "*.csv");
for (int i = 0; i < fileNames.Length; i++)
{
    string file = @"read\" + Path.GetFileName(fileNames[i]);
    var lines = File.ReadLines(file).Skip(1);
    (List<string> dataA, List<string> dataB) = SplitAllTodataAAnddataB(lines);
    var rowLog = 0;
    foreach (var line in dataA)
    {
       // Variablen für lines
       string[] entries = line.Split(';');
       rowLog++;
       Helper.checkdataAString(entries[0].ToLower(), "abc", rowLog);
       Helper.checkdataAString(entries[1].ToLower(), "firstname", rowLog);
       Helper.checkdataAString(entries[2].ToLower(), "lastname", rowLog);
       Helper.checkdataAString(entries[4].ToLower(), "gender", rowLog);
       Helper.checkdataAString(entries[5].ToLower(), "id", rowLog);
       Helper.checkdataAString(entries[3], "date", rowLog);
       Helper.drawTextProgressBar("loaded rown", rowLog, dataA.Count());
    }
    Console.WriteLine("\nencryypting data");
    var output = new List<string>();
    foreach (var line in dataA)
    {
       try
       {
          string[] entries = line.Split(';');
          string abc = entries[0].ToLower();
          string firstName = koeln.GetPhonetics(entries[1]).ToLower();
          string lastName = koeln.GetPhonetics(entries[2]).ToLower();
          string date = entries[3];
          //Hier werden die drei vorherigen Variablen konkatiniert.
          string NVG = FirstName + "_" + LastName + "_" + BirthDate;
          string gender = entries[4].ToLower();
          string age = Helper.Left(Convert.ToString(20171027 - Convert.ToInt32(entries[3])), 2);
          string zid = Guid.NewGuid().ToString();
          string fid = entries[5].ToLower();
          rowdataA++;
          output.Add($"{abc}; {NVG}; {gender}; {age}; {zid}; {fid}");
          Helper.drawTextProgressBar("encrypted rows.", rowdataA, dataA.Count());
       }
       catch { rowdataA++; }
    }
    File.WriteAllLines(fileTest, output);
}

I'm kinda new to developing so im just trying and any help would be appreciated.

Matan Shahar
  • 3,190
  • 2
  • 20
  • 45
Hakunama Tatarov
  • 125
  • 1
  • 14
  • 3
    There is plenty of libraries out there, you can try investigating their code. It is a good way to learn: https://stackoverflow.com/questions/2081418/parsing-csv-files-in-c-with-header – Santhos May 31 '18 at 10:30
  • Set rid of the skip : Skip(1) – jdweng May 31 '18 at 10:34

3 Answers3

2

You can read file this way:

string file = @"read\" + Path.GetFileName(fileNames[i]);
var content = File.ReadLines(file);

var header = content.ElementAt(0);
var lines = content.Skip(1);
vasily.sib
  • 3,871
  • 2
  • 23
  • 26
1

The answer

List<string> lines = File.ReadLines(file);

This contains all the lines from the file. We know that the first line is the header, and the rest is the content.

List<string> contentLines = lines.Skip(1);

This is what you had in your code. It contains all lines except the first.

So how do we get only the header line?

string headerLine = lines.First();

There we go. Notice that this returns a single string, not a list of strings.
If you want to receive a list of strings (e.g. if you have a header that spans two or more lines), then you can do:

List<string> headerLines  = lines.Take(amount_of_header_lines);
List<string> contentLines = lines.Skip(amount_of_header_lines);

Simply put, Take(X) takes the first X items, and Skip(X) takes everything except the first X items.


Footnotes

  • Notice that I put lines = File.ReadLines(file) in a separate variable first. If I had called File.ReadLines(file) for both the header lines and the content lines (instead of using the lines variable), I would have read the file twice. That may not matter to you now, but it can lead to performance issues and it's pointless work.
  • The logic for splitting the header line into parts is the same as the logic you have for splitting the content lines into parts.
  • I used Single. You might want to use SingleOrDefault (or you might not). But that ties into a different discussion that is not the focus here.
  • Your code accounts for simple CSV structures, but this can get really complicated really fast.
    • If you want to use a semicolon as part of your cell value, then you wrap the cell value in quotes. For example, notice that this data only represents three columns: ColumnA;"ColumnB;StillColumnB";ColumnC. Your code (line.Split(';')) will not account for that.
    • A single row of a table (in Excel) may be split over two lines (when you look at the csv file in a text editor). This happens if there is a newline character inside a cell's value. File.ReadLines() does not account for that.
    • When trying to create a parser for a seemingly simple data format; always check if there is an existing library for this. Don't reinvent the wheel (unless it's for training purposes). There are a lot of edge cases that you are currently not thinking of, but will eventually become the death of your initially simple code.
  • Without intending any offense, your code isn't the cleanest. If you're interested in improving the quality, I suggest posting this code to the CodeReview StackExchange (mention that you're a beginner so you don't get overwhelmed with complex explanations). CodeReview only allows working code, so you need to finish it before you post.
Flater
  • 12,908
  • 4
  • 39
  • 62
1

If I understood correctly, you need to read the whole file, process all the lines except the header, then write back a different file with the header and the processed lines, right?

If so, the following approach should work:

var allLines = File.ReadAllLines(originalFile);
var headerLine = allLines.First();
var dataLines = allLines.Skip(1);
var processedLines = ProcessLines(dataLines);
File.WriteAllLines(newFile, (new[] {headerLine}.Concat(processedLines)).ToArray());

The ProcessLines method would accept the original lines as parameter and return a list with the processed lines:

IEnumerable<string> ProcessLines(IEnumerable<string> originalLines)
{
    var processedLines = new List<string>();
    foreach(var line in originalLines)
    {
        var processedLine = //generate your processed line here
        processedLines.Add(processedLine);
    }
    return processedLines;
}
Konamiman
  • 49,681
  • 17
  • 108
  • 138
  • that actualy helped me a lot thanks, i posted the wrong code first, i have a testmode and a real mode and in the real mode i work with different datatypes but this here helped me to work it out so thanks again. – Hakunama Tatarov May 31 '18 at 12:34