0

My code is as follow
The output is null, I should use while loop as you see, but somewhere in my code is not working properly. and I'm not able to use any other library I need an hours array and rate array for employeeId==1 and the same for other employees(the CSV file is larger than what I pasted here)

public static List<PayRecord> ImportPayRecords(string file)
{
    List<PayRecord> payRecords = new List<PayRecord>();
    StreamReader reader = new StreamReader(file);
    int EmployeeId;
    int i = 0;
    int id;
    string Visa = "";
    string YearToDate = "";
    string record;
    string[] strArr;
    double[] Hours = new double[10];
    double[] Rate = new double[10];

    record = reader.ReadLine();
    strArr = record.Split(',');

    while (record != null)
    {
        EmployeeId = int.Parse(strArr[0]);

        if (strArr[3] != "")
        {
            Visa = strArr[3];
            YearToDate = strArr[4];
        }

        id = EmployeeId;

        while (record != null && id == int.Parse(strArr[0]))
        {
            Hours[i] = double.Parse(strArr[1]);
            Rate[i] = double.Parse(strArr[2]);
            i++;
            record = reader.ReadLine();

            if (record != null)
            {
                strArr = record.Split(',');
            }                   
        }

        PayRecord rec = CreatePayRecord(EmployeeId, Hours, Rate, Visa, YearToDate);                
        payRecords.Add(rec);
        double gross = rec.Gross;
        i = 0;
        Array.Clear(Hours, i, 10);
        Array.Clear(Rate, i, 10);
    }

    return payRecords;
}

the CSV file is like this:

EmployeeId,Hours,Rate,Visa,YearToDate

1,2,25,,
1,3,25,,
1,3,25,,
1,4,25,,
1,5,32,,
1,6,32,,
2,2,25,417,47520
2,2,25,417,47520
2,2,25,417,47520
2,2,25,417,47520
2,2,25,417,47520
2,2,28,417,47520
2,2,28,417,47520
2,2,28,417,47520
Alex Jey
  • 1
  • 2
  • have you thought about using an already available library for parsing CSV instead of writing your own? there's plenty around, and all the struggles you have right now, they had themselves a long time ago. – Franz Gleichmann Sep 10 '21 at 09:23
  • I'd definitely recommend a library. Some even allow you to map an object automatically. – Olaf Sep 10 '21 at 09:24
  • 1
    Execute an extra `ReadLine` *before* the loop. – Panagiotis Kanavos Sep 10 '21 at 09:24
  • 1
    Instead of calling `ReadLine` in a loop you could use `File.ReadLines().Skip(1).Select(line=>{ var fields=line.Split(separator);...., return Create(...)}).ToList()` to read the lines as an `IEnumerable, skip the first then use LINQ's `Select` to parse each line and produce a list of records. Even better, extract the code that parses a single line to a separate method. This will make the code easier to write and test, whether you use LINQ or not to process the lines – Panagiotis Kanavos Sep 10 '21 at 09:28
  • I'd recommend using [TextFieldParser](https://learn.microsoft.com/en-us/dotnet/api/microsoft.visualbasic.fileio.textfieldparser?redirectedfrom=MSDN&view=net-5.0) which will surely help you with the many more problems you are likely to encounter further down the road. – Filburt Sep 10 '21 at 09:40
  • 1
    Does this answer your question? [Reading a CSV file in .NET?](https://stackoverflow.com/questions/1405038/reading-a-csv-file-in-net) – O. Jones Sep 10 '21 at 10:08

2 Answers2

1

Although a CSV library will help with complex files, this file isn't so complex. To solve the immediate problem, add a ReadLine before reading the first record:

reader.ReadLine();
record = reader.ReadLine();

The code would become a lot easier to write, debug and test with a bit of cleanup. Instead of using a StreamReader like this, you can use File.ReadLines which returns an IEnumerable<string>. The result is the same as this code, only a lot easier to write. Once you have an IEnumerable<T> you can skip lines, parse lines etc :

int headerRows=1;
var lineRecords = File.ReadLines(file)
                     .Skip(headerRows)
                     .Select(line=>ParseLine(line,","))
                     .ToList()

Don't try to both parse records and analyze them in the same method.

Parsing the line and creating the pay record should be performed by a separate method. I'm using a C# record just so I don't have to write a class with explicit properties :

record LineRecord(int Id,double Hours,double Rate,string Visa,string YearToDate);

LineRecord ParseLine(string line,string separator)
{
    var fields=line.Split(separator);

    var id = int.Parse(fields[0]);
    var hours = double.Parse(fields[1]);
    var rate = double.Parse(fields[2])
    var visa = fields[3];
    var yearToDate = fields[4]

    return new LineRecord(id, hours, rate, visa, yearToDate); 
}

Even if you use a library like CsvHelper you'll end up at this point, with a list of parsed records.

After that, you can analyze and group the data any way you want using eg LINQ:

var byEmployee=lineRecords.GroupBy(rec=>rec.Id)
                          .Select(g=>new {
                              Id=g.Key,
                              Hours=g.Select(r=>r.Hours),ToArray(),
                              Rates=g.Select(r=>r.Rate).ToArray(),        
                              Visa =g.First(r=>r.Visa),
                              YearToDate = g.First(r=>r.YearToDate)
                          });

or

var payRecords=lineRecords.GroupBy(rec=>rec.Id)
                          .Select(g=> CreatePayRecord(
                              g.Key,
                              g.Select(r=>r.Hours),ToArray(),
                              g.Select(r=>r.Rate).ToArray(),        
                              g.First(r=>r.Visa),
                              g.First(r=>r.YearToDate)
                          ))
                          .ToList();

You could even combine both operations in a single LINQ query:

var payRecords = File.ReadLines(file)
                     .Skip(headerRows)
                     .Select(line=>ParseLine(line,","))
                     .GroupBy(rec=>rec.Id)
                     .Select(g=> CreatePayRecord(
                             g.Key,
                             g.Select(r=>r.Hours),ToArray(),
                             g.Select(r=>r.Rate).ToArray(),        
                             g.First(r=>r.Visa),
                             g.First(r=>r.YearToDate)
                      ))
                      .ToList();
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • You wrote *this file isn't so complex.* True, and your solution is good. But! if you tell users they can give you CSV files, they'll often give you some big mess of a CSV file. That's the reason to use [CsvHelper](https://www.nuget.org/packages/CsvHelper/) or the built in and bizarrely named [Microsoft.VisualBasic.FileIO.TextFieldParser](http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx) – O. Jones Sep 10 '21 at 10:07
  • `TextFieldParser` does little more that provide some string fields and comes with a lot of dead weight in that `VisualBasic` reference. You still have a *lot* of work to do to parse the fields. And if you use CsvHelper you know it's not that easy to configure. Or stable (breaking changes every month). Or well tested (try loading a file where all fields are quoted) – Panagiotis Kanavos Sep 10 '21 at 10:13
1

Your code is not making it clear what you really want, with some guessing, you want a result like (displayed as JSON for easy representation):

[
  {
    "EmployeeId": 1,
    "Hours": [2.0,3.0,3.0,4.0,5.0,6.0],
    "Rate": [25.0,25.0,25.0,25.0,32.0,32.0],
    "Visa": "",
    "YearToDate": ""
  },
  {
    "EmployeeId": 2,
    "Hours": [2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0],
    "Rate": [25.0,25.0,25.0,25.0,25.0,28.0,28.0,28.0],
    "Visa": "417",
    "YearToDate": "47520"
  }
]

Right? Then:

public static List<PayRecord> ImportPayRecords(string file)
{
    return File.ReadAllLines(file)
    .Skip(1)
    .Select(f => f.Split(','))
    .Select(x => new {id=x[0], hour=x[1], rate=x[2],visa=x[3],ytd=x[4]})
    .GroupBy(x => x.id)
    .Select(x => new PayRecord {
        EmployeeId = int.Parse(x.Key),
        Hours = x.Select(y => double.Parse(y.hour)).ToArray(),
        Rate = x.Select(y => double.Parse(y.rate)).ToArray(),
        Visa = x.First().visa,
        YearToDate = x.First().ytd
    })
    .ToList();
}

BTW, there are ready made libraries like LinqToCSV to do this type of things easier.

Cetin Basoz
  • 22,495
  • 3
  • 31
  • 39