29

I am having a csv file like this

A, 22, 23, 12
B, 32, 4, 33
C, 34, 3 ,33

I want to print the sum and average of each row and skip the first column. How to do in LINQ using Lambda

Joel
  • 327
  • 1
  • 3
  • 5

9 Answers9

43
var stuff = from l in File.ReadAllLines(filename)
            let x = l.Split(new [] {',', ' '}, StringSplitOptions.RemoveEmptyEntries)
                     .Skip(1)
                     .Select(s => int.Parse(s))
            select new
            {
                Sum = x.Sum(),
                Average = x.Average()
            };

If you're reading big files and memory use is a concern, then the following will work better using .NET 4:

var stuff = from l in File.ReadLines(filename)
            let x = l.Split(new [] {',', ' '}, StringSplitOptions.RemoveEmptyEntries)
                     .Skip(1)
                     .Select(s => int.Parse(s))
            select new
            {
                Sum = x.Sum(),
                Average = x.Average()
            };

In both cases, the stuff variable contains an enumerable which won't actually be executed until you start reading from it (e.g. inside a foreach loop).

Bennor McCarthy
  • 11,415
  • 1
  • 49
  • 51
  • Don't forget to convert strings to integers. `X = (int)x[1]` won't compile. Try using `Int32.Parse()` or `Convert.ToInt32()`. – Steven Feb 25 '11 at 11:35
  • Thanks for pointing that out. I've fixed it, and simplified the query a bit by using the built in aggregates. – Bennor McCarthy Feb 25 '11 at 11:39
  • 2
    Not exactly as it is, but if you're using .NET 4, just change `ReadAllLines` to `ReadLines`. – Bennor McCarthy Feb 25 '11 at 12:23
  • @Mou: the difference is `ReadLines` in the second example. `ReadAllLines` loads the entire file into a string array in memory, but `ReadLines` reads one line at a time as an `IEnumerable`, which will perform better for very large files. – Bennor McCarthy Jun 15 '15 at 09:43
  • 1
    Splitting on comma is a bad idea, data can contain commas. – Andrew Mar 31 '20 at 14:47
  • Depends on your input. The OP's example contains single letter labels and numbers only. If that's all you ever have to process there's no reason not to just split on commas. – Bennor McCarthy Mar 31 '20 at 19:03
  • If you have text data which may contain commas, then you obviously shouldn't split on commas. But you probably shouldn't be trying to process it with LINQ either. There are better tools for the job. – Bennor McCarthy Mar 31 '20 at 19:05
8
        string csvFile = @"myfile.csv";
        string[] lines = File.ReadAllLines(csvFile);

        var values = lines.Select(l => new { FirstColumn = l.Split(',').First(), Values = l.Split(',').Skip(1).Select(v => int.Parse(v)) });
        foreach (var value in values)
        {
            Console.WriteLine(string.Format("Column '{0}', Sum: {1}, Average {2}", value.FirstColumn, value.Values.Sum(), value.Values.Average()));
        }
Duncan Watts
  • 1,331
  • 9
  • 26
7

Try to use this old but still good library: FileHelpers Library

It's very easy to use:

char delimiter = ',';
var dt = FileHelpers.CsvEngine.CsvToDataTable(fileName,delimiter);

then just do:

var rowStats = dt.AsEnumerable()
                 .Select(x => x.ItemArray.Select(y => Convert.ToInt32(y)))
                 .Select(x => new { avg = x.Average(), sum = x.Sum() });

foreach (var rowStat in rowStats)
{
    Console.WriteLine("Sum: {0}, Avg: {1}", rowStat.sum, rowStat.avg);
}
digEmAll
  • 56,430
  • 9
  • 115
  • 140
  • 1
    +1 for using FileHelpers, although personally I'd create a custom type instead of resorting to DataTable in order to increase readability tremendously. – Martin R-L Feb 25 '11 at 16:00
  • @Martin: Yes you're right, but of course you can use a Type only if the CSV file has always the same structure (e.g the same number of columns) – digEmAll Feb 25 '11 at 16:53
5
string[] csvlines = File.ReadAllLines(@txtCSVFile.Text);

var query = from csvline in csvlines
  let data = csvline.Split(',')
  select new
  {
   ID = data[0],
   FirstNumber = data[1],
   SecondNumber = data[2],
   ThirdNumber = data[3]
  };
MUG4N
  • 19,377
  • 11
  • 56
  • 83
4

I just have discovered LinqToCsv library, it do all the parsing stuff and then you can query objects like collections and it supports deferred reading:

http://www.codeproject.com/Articles/25133/LINQ-to-CSV-library

0lukasz0
  • 3,155
  • 1
  • 24
  • 40
3

Actually for most cases you should avoid splitting based on ',' only because you could have coma in string.

I give you a better generic solution using Regex and easy to use:

var stuff = File.ReadAllLines(csvFilePath)
    .Skip(1) // For header
    .Select(s => Regex.Match(s, @"^(.*?),(.*?),(.*?),(.*?),(.*?)$"))
    .Select(data => new 
    {
        Foo = data.Groups[1].Value,
        Bar = data.Groups[2].Value,
        One = data.Groups[3].Value,
        Two = data.Groups[4].Value,
    });

And you can find more details here https://stackoverflow.com/a/18147076/196526

Bastien Vandamme
  • 17,659
  • 30
  • 118
  • 200
  • 2
    Thank you for the -1. I like to know when I'm doing something wrong. Could you please tell me why -1 because this is the only valid solution when working with csv file that contains strings. Is it because it is not good practice to use Regex or because this answer is too generic for the question or maybe you just don't like Regex? Why this bad reaction. – Bastien Vandamme Jun 20 '19 at 03:41
0

Hi you are looking for something like this

  var rows = new List<string> {"A, 22, 23, 12", "B, 32, 4, 33", "C, 34, 3 ,33"};
     foreach (var row in rows) {
            var sum = row.Split(',').Skip(1).Sum(x => Convert.ToInt32(x));
            var avg = row.Split(',').Skip(1).Average(x => Convert.ToInt32(x));
     }
nhu
  • 470
  • 1
  • 5
  • 10
0

Something like this maybe:

var csv = @"A, 22, 23, 12
B, 32, 4, 33
C, 34, 3 ,33";

var lines =
    csv.Split('\n').Select(x => x.Split(',').Skip(1).Select(n => int.Parse(n))).Select(x => new {Sum = x.Sum(), Average = x.Average()});
foreach (var line in lines)
{
    Console.WriteLine("Sum: " + line.Sum);
    Console.WriteLine("Average: " + line.Average);
}

In general, I don't suggest to do something like this. You should use a full blown CSV reader to parse the CSV file and you should include error handling.

Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
0
using System.IO

// turn file into IEnumerable (streaming works better for larger files)
IEnumerable<Tuple<int, int, int>> GetTypedEnumerator(string FilePath){
  var File = File.OpenText(FilePath);
  while(!File.EndOfStream) 
      yield return new Tuple<int, int, int>(
          Int.Parse(File[1]), 
          Int.Parse(File[2], 
          Int.Parse(File[3])
      );
   File.Close();
}

// this lines would return the sum and avg for each line
var tot = GetTypeEnumerator(@"C:\file.csv").Select(l=>l.Item1 + l.Item2 + l.Item3);
var avg = GetTypeEnumerator(@"C:\file.csv").Select(l=> (l.Item1 + l.Item2 + l.Item3) / 3);

The streaming aporoach will let you handle laregr files because you wouldn;t need toload them into memeory first. Don't have VS here, haven't checked the syntax, might not compile as is.

Regards GJ

Damn, lot of answers already, need to type faster!

gjvdkamp
  • 9,929
  • 3
  • 38
  • 46