0

so I have this application that I have inherited from someone that is long gone. The gist of the application is that it reads in a .cvs file that has about 5800 lines in it, copies it over to another .cvs, which it creates new each time, after striping out a few things , #, ', &. Well everything works great, or it has until about a month ago. so I started checking into it, and what I have found so far is that there are about 131 items missing from the spreadsheet. Now I read someplace that the maximun amount of data a string can hold is over 1,000,000,000 chars, and my spreadsheet is way under that, around 800,000 chars, but the only thing I can think is doing it is the string object.

So anyway, here is the code in question, this piece appears

to both read in from the existing field, and output to the new file:

StreamReader s = new StreamReader(File);

//Read the rest of the data in the file.
string AllData = s.ReadToEnd();

//Split off each row at the Carriage Return/Line Feed
//Default line ending in most windows exports.
//You may have to edit this to match your particular file.
//This will work for Excel, Access, etc. default exports.
string[] rows = AllData.Split("\r\n".ToCharArray(), System.StringSplitOptions.RemoveEmptyEntries);

//Now add each row to the DataSet
foreach (string r in rows)
{
    //Split the row at the delimiter.

    string[] items = r.Split(delimiter.ToCharArray());

    //Add the item
    result.Rows.Add(items);
}

If anyone can help me I would really appreciate it. I either need to figure out how to split the data better, or I need to figure out why it is cutting out the last 131 lines from the existing excel file to the new excel file.

Ňɏssa Pøngjǣrdenlarp
  • 38,411
  • 12
  • 59
  • 178
Mike
  • 17
  • 1
  • 8
  • have you tried debugging it? Frankly, as much as I don't like the way the code is written, I don't see how your code will cause 131 lines to disappear. Are you sure you are not filtering out some important bits of code before or after what you have included here? – sstan Jun 04 '15 at 23:51
  • I don't see any problems in the code you've posted, other than the terrible approach your predecessor took. If you run out of memory you'll get an error. In case of an error - and assuming that the program also writes the entire string at a shot just like it reads - chances are you'd have no file created. Make sure the output file doesn't exist before running the program, in case the missing 131 lines means you're looking at an older file. Beyond that, all I can recommend is stepping through the program line by line. And clean it up if you can - @justin.m.chase's answer is an excellent start. – Ed Gibbs Jun 04 '15 at 23:54
  • is that all the actual code? if it is the case, i'm with sstan, there is no obvious reason for the mission line – Fredou Jun 04 '15 at 23:54
  • possible duplicate of [Parsing CSV files in C#](http://stackoverflow.com/questions/2081418/parsing-csv-files-in-c-sharp) – Fredou Jun 04 '15 at 23:59
  • One thought about the missing lines. It could be that whatever is appending the last lines is using only `"\n"` instead of `"\r\n"` in which case you may want to split on `"\n"` instead and then trim trailing whitespace (`\r` is whitespace). – justin.m.chase Jun 05 '15 at 00:04
  • @justin.m.chase: Notice `"\r\n".ToCharArray()`. It's a bit weird, but it already acomplishes the goal of splitting lines by either character, not a combination of them. – sstan Jun 05 '15 at 00:43

2 Answers2

3

One easier way to do this, since you're using "\r\n" for lines, would be to just use the built-in line reading method: File.ReadLines(path)

foreach(var line in File.ReadLines(path))
{
   var items = line.Split(',');
   result.Rows.Add(items);
}
justin.m.chase
  • 13,061
  • 8
  • 52
  • 100
  • or `File.ReadLines` if you're worried about performance and don't want the whole file in memory at once – kaveman Jun 04 '15 at 23:49
  • do not forget that "data" could have the comma in it so you do not want to parse it like that. use something already build – Fredou Jun 04 '15 at 23:49
  • myid1, "hi, my name is abc", someotherdata, etc... you see the probleme here, right? – Fredou Jun 04 '15 at 23:51
  • I do like your code improvement. But I don't see how it answers the question. – sstan Jun 04 '15 at 23:52
  • He's saying, if you have a comma inside the content of one of the cells you may get an error. It really depends on your data. For example: `1,2,"chase, justin",xyz,345`. The presence of the comma in the quoted area would be split and be a problem. If you don't have data like that then don't worry about it. Otherwise you'll need to split on a regex probably. – justin.m.chase Jun 04 '15 at 23:57
  • Regarding the error you mentioned @Mike, it appears that you have locally defined `File` to be the path to the file in question. A couple of things here. That conflicts with the very common class `System.IO.File`, which is what I was referring to so I would recommend two things: 1) rename your variable named `File` to be `path` or `filePath` or something like that. 2) To the top of your file add `using System.IO;`. This will allow you to access the static method `ReadLines` on the `File` class. – justin.m.chase Jun 05 '15 at 00:00
  • Ok, so I have commas all over the place in my cvs file, not sure what to do about those right now, but I can tell you that I have added using System.IO; in my code already. THis application has been working fine for over a year now. Also, I do not have a variable named File, at least not one that I have seen. – Mike Jun 05 '15 at 00:08
  • In the line `StreamReader s = new StreamReader(File);`, `File` is being passed into the `StreamReader` constructor, implying that `File` is defined as a variable or a parameter to the function. Rename it. – justin.m.chase Jun 05 '15 at 14:44
0

You may want to check out the TextFieldParser class, which is part of the Microsoft.VisualBasic.FileIO namespace (yes, you can use this with C# code)

Something along the lines of:

using(var reader = new TextFieldParser("c:\\path\\to\\file"))
{
    //configure for a delimited file
    reader.TextFieldType = FieldType.Delimited;

    //configure the delimiter character (comma)
    reader.Delimiters = new[] { "," };

    while(!reader.EndOfData)
    {
        string[] row = reader.ReadFields();
        //do stuff
    }
}

This class can help with some of the issues of splitting a line into its fields, when the field may contain the delimiter.

kaveman
  • 4,339
  • 25
  • 44
  • Well, I tried to find the assembly for Microsoft.VisualBasic.FileIO, but there doesn't seem to be one. Can I do this using C# 4.0? – Mike Jun 05 '15 at 17:24
  • `Microsoft.VisualBasic.FileIO` is the *namespace*. You can find it inside the *assembly* `Microsoft.VisualBasic`. Yes, this is available with C# 4/.NET 4.5 projects. – kaveman Jun 05 '15 at 19:18