1

What is a more optimal way to read in variables by columns and query columns. Here is what I am currently doing.

var lines = File.ReadAllLines("readme.csv").Select(a => a.Split(','));

//load columns into variables
var col1 = from line in lines select line[1];
var col2 = from line in lines select line[2];
var col3 = from line in lines select line[3];
var col4 = from line in lines select line[4];

//query column 1
foreach (string line in col1)
{ //query if ...
}
//query column 2
foreach (string line in col2)
{ //query if ...
}
//query column 3
foreach (string line in col3)
{ //query if...
}
joce
  • 9,624
  • 19
  • 56
  • 74
Charles Morrison
  • 418
  • 1
  • 4
  • 23
  • the body of each loop is the same? – Ilya Ivanov Mar 25 '13 at 21:37
  • What are you asking? That amount of code is probably necessary for what you are doing, but without more code to look at, I can't tell what you really want to know. – feralin Mar 25 '13 at 21:38
  • It sounds like a multidimensional array would be a good fit here, but @IlyaIvanov is right, is the processing the same? – neontapir Mar 25 '13 at 21:38
  • Not sure about the query, but you might consider changing `File.ReadAllLines` to `File.ReadLines`. They have the same ultimate effect, but `ReadLines` won't load the whole file into memory at once. Rather, it loads a small bit of the file at a time. – Jim Mischel Mar 25 '13 at 21:40
  • @JimMischel he has 4 linq queries, would it make to iterate over file 4 times if the loading of a file would be lazily? – Ilya Ivanov Mar 25 '13 at 21:42
  • @IlyaIvanov: Then do a `ToList()`. Which should be done anyway, to prevent doing the `Split` for every line four different times. – Jim Mischel Mar 25 '13 at 22:11
  • @JimMischel `ReadAllLines` returns an array, why do you want to do a `ToList()`? Maybe I'm missing something – Ilya Ivanov Mar 25 '13 at 22:43
  • @IlyaIvanov: The way it's written, each line will be split 4 times. If you write a `ToList` (or `ToArray`) at the end of the query, the lines will be split only once. It's the same principle as your objection to my suggestion of `File.ReadLines`. – Jim Mischel Mar 25 '13 at 23:08
  • If you look at this answer, you can see that the person created an anonymous type for an easier access to column members : http://stackoverflow.com/a/1375435/361899 – aybe Mar 25 '13 at 21:41
  • @JimMischel yes, just because `Select` return `IEnumerable`, good that we came to one conclusion – Ilya Ivanov Mar 26 '13 at 06:24

2 Answers2

1

What is a more optimal way to read in variables by columns and query columns.

You are trying to solve a problem that has been solved before. There are a number of nuances with CSV files related to escaping and quoting.

I would suggest using CSV Reader from Code Project.

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

I have used this extensively with very large data files (multi-GB).

Use a BufferedStream for maximum performance:

using (FileStream fs = File.Open(csvPath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (CsvReader csv = new CsvReader(new StreamReader(bs), true))
Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • Another CSV reader from Codeproject http://www.codeproject.com/Articles/242602/Strongly-typed-Csv-reader-CsvToObj-and-Code-First , the post seems long but you can cut straight to the csv parser. – Giorgio Minardi Mar 25 '13 at 21:51
1

ToList() prevents duplicate looping as answered in comment

Charles Morrison
  • 418
  • 1
  • 4
  • 23