2

I have a CSV file and I am reading data byte by byte by using buffered stream. I want to ignore reading the line if the last column = "True". How do I achieve it?

So far I have got:

BufferedStream stream = new BufferedStream(csvFile, 1000);
int byteIn = stream.ReadByte();

while (byteIn != -1 && (char)byteIn != '\n' && (char)byteIn != '\r')
    byteIn = stream.ReadByte();

I want to ignore reading the line if the last column of the line is "True"

user7116
  • 63,008
  • 17
  • 141
  • 172
InfoLearner
  • 14,952
  • 20
  • 76
  • 124

4 Answers4

1

Firstly, I wouldn't approach any file IO byte-by-byte without an absolute need for it. Secondly, reading lines from a text file in .Net is a really cheap operation.

Here is some naive starter code, which ignores the possibility of string CSV values:

List<string> matchingLines = new List<string>();
using (var reader = new StreamReader("data.csv"))
{
    string rawline;
    while (null != (rawline = reader.ReadLine()))
    {
        if (rawline.TrimEnd().Split(',').Last() == "True") continue;

        matchingLines.Add(rawline);
    }
}

In reality, it would be advised to parse each CSV line into a strongly typed object and then filter on that collection using LINQ. However, that can be a separate answer for a separate question.

user7116
  • 63,008
  • 17
  • 141
  • 172
  • what if a column can contain a comma separated value inside quote marks. In that case, your code will break. Plus what if I wanted to check if nth column contains a @ sign and that would then mean we need to treat the line differently. i would love to use textreaders but the requirements require me to use the above – InfoLearner Jun 06 '11 at 20:37
  • @Knowledge: per my answer, "...ignores the possibility of string CSV values". It was intended as a starting point. However, I guess I am wondering why you're constrained to a `BufferedStream`? – user7116 Jun 06 '11 at 20:44
0

I would read/import the entire CSV file into a DataTable object and then do a Select on the datatable to include rows where last column not equal to true.

jkirkwood
  • 101
  • 4
0

In addition to jkirkwood's answer, you could also read each line and conditionally add a class or struct to a list of objects.

Some quick, semi-pseudocode:

List<MyObject> ObjectList = new List<MyObject>();
struct MyObject
{
    int Property1;
    string Property2;
    bool Property3;
}

while (buffer = StreamReader.ReadLine())
{
    string[] LineData = buffer.Split(',');
    if (LineData[LineData.Length - 1] == "true") continue;
    MyObject CurrentObject = new MyObject();
    CurrentObject.Property1 = Convert.ToInt32(LineData[1]);
    CurrentObject.Property2 = LineData[2];
    CurrentObject.Property3 = Convert.ToBoolean(LineData[LineData.Length - 1]);
    ObjectList.Add(CurrentObject);
}

It really kind of depends on what you want to do with the data once you've read it.

Hopefully this example is a bit helpful.

EDIT

As noted in comments, please be aware this is just a quick example. Your CSV file may have qualifiers and other things which make the string split completely useless. The take-away concept is to read line data into some sort of temporary variable, evaluate it for the desired condition, then output it or add it to your collection as needed.

EDIT 2

If the line lengths vary, you'll need to grab the last field instead of the *n*th field, so I changed the boolean field grabber to show how you would always get the last field instead of, say, the 42nd one.

JYelton
  • 35,664
  • 27
  • 132
  • 191
  • BufferedStream does not contain a ReadLine() method – InfoLearner Jun 06 '11 at 17:09
  • I just did `.ReadLine` to indicate that the loop is incrementing lines. It's intended only as pseudocode. – JYelton Jun 06 '11 at 17:13
  • Very naive method of parsing CSV files, by this may work depending on what the data looks like. CSV allows for commas within field values which can be enclosed by quotes. You can even have carriage returns inside field values if you enclose them in quotes. And of course if you want quotes in fields values, they are usually escaped by having quotes twice. Whether or not these situations actually apply to the person asking the question isn't known, but these are some important things to think about. – Kibbee Jun 06 '11 at 17:15
  • @Kibbee Once again, *pseudocode.* I'm familiar with qualifiers and delimiters being present in actual data. This is not supposed to be a complete solution, it's just a way of showing how to evaluate the OP's condition before adding data to the collection or output. Hopefully KnowledgeSeeker will use appropriate means to filter data accordingly. – JYelton Jun 06 '11 at 17:17
  • the lines length vary, also lines contain special characters i.e. quote marks, %, #, @, $ and we have to call different methods depending on them. but ignore the line if the last column is true – InfoLearner Jun 06 '11 at 17:24
  • 1
    You may want to explore some CSV parsers and libraries instead of rolling your own: [Reading CSV files in .NET?](http://stackoverflow.com/q/1405038/161052), [Parsing CSV files in C#](http://stackoverflow.com/q/2081418/161052), [Reading CSV files in C#](http://stackoverflow.com/q/1544721/161052) - and many others. – JYelton Jun 06 '11 at 17:29
  • But once the file is parsed, the solution is trivial. Checking if the last column has a value of "true" is a simple if statement. Actually, the question doesn't even make sense. If the file is being read byte by byte, and you want to not read the line if the last column is "true" then this is impossible, since by the time you read the last column, the line has already been read. If you reword it to how do I not process the line, we are left with a simple if statement. The only complexity doing this is the actual parsing of the CSV file. – Kibbee Jun 06 '11 at 18:16
0

Here is a solution using a StreamReader, rather than a BufferedStream:

public string RemoveTrueRows( string csvFile )
{
    var sr = new StreamReader( csvFile );
    var line = string.Empty;
    var contentsWithoutTrueRows = string.Empty;
    while ( ( line = sr.ReadLine() ) != null )
    {
        var columns = line.Split( ',' );
        if ( columns[ columns.Length - 1 ] == "True" )
        {
            contentsWithoutTrueRows += line;
        }
    }
    sr.Close();

    return contentsWithoutTrueRows;
}
danielpops
  • 713
  • 6
  • 13