4

I have a CSV file which has a delimiter of "|" to separate the fields.

I am using the code below to read the file and put it into a List

 var reader = new StreamReader(File.OpenRead(openFileDialog1.FileName));
 List<string> list1 = new List<string>();
 List<string> list2 = new List<string>();
 List<string> list3 = new List<string>();
 List<string> list4 = new List<string>();

 while (!reader.EndOfStream)
 {
     var line = reader.ReadLine();
     var values = line.Split('|');

     list1.Add(values[0]);
     list2.Add(values[1]);
     list3.Add(values[2]);
     list4.Add(values[3]);
 }

then I'm gonna put it into a DataSet

DataSet ds = new DataSet();
ds.Tables.Add("barcode");

for (int i = 1; i < list1.Count; i++)
{
    ds.Tables[0].Rows.Add(list1[i], list2[i], list3[i], list4[i]);
}

It's all good IF the data is like this

373|A0000006-04|EACH|2600003347225  
373|A0000006-04|EACH|9556076004684  
373|A0000006-04|EACH|9556076006374  
373|A0000006-04|PK12|2600003347232  
373|A0000006-04|PK12|9556076004691  

However, some of the data might look like this

373|A0000029-01|PK12|1899886
6604250
373|A0000029-01|PK12|2652357563394
373|A0000030-01|EACH|2600001
539189
373|A0000030-01|EACH|8998866604284

As you can see, some of the data are using 2 lines. Is there any ways that I can read them as the same row instead of 2 different rows? Or do I have to put a delimiter such as a comma or semicolon in order to identify them as the same row?

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
Digital
  • 53
  • 6
  • you probably have a newline in there that `reader.ReadLine()` is identifying, can you check the source data? – Mr. Mr. Mar 26 '13 at 08:30
  • Use a CSV parsing library such as OpenCSV - no point re-inventing the wheel. – RB. Mar 26 '13 at 08:30
  • 1
    Multi-line CSV would notmally have double-quotes around the data with a carriage return in it, so you know where it ends, i.e. `373|A0000030-01|EACH|"2600001` (on one line) and `539189"` (on the next). Without something like that, it is going to be *really* ambiguous. *With* that - I could suggest a few standard readers that'll handle that without modification. – Marc Gravell Mar 26 '13 at 08:31
  • hmmm... so using the available library is better? I am just trying to find the best solution here... – Digital Mar 26 '13 at 08:31
  • Why reinvent the wheel when there's a "ready" solution available? – jordanhill123 Mar 26 '13 at 08:32
  • In practical terms, if you have a line with only one part that single part should be added at the previous part[3]? – Steve Mar 26 '13 at 08:33
  • there are no quotes between the data... hmmm... so I guess the best way is to tell the users to include quotes right? – Digital Mar 26 '13 at 08:34
  • 1
    @Digital quite simply, what you have at the moment is not really CSV - it is just "some random text data with delimiters". Now, if you *include* quotes you could use a CSV parser to handle it (as long as it allows you to specify the delimiter - most do); however... whether that will actually be the best option depends on more context. You could also tell them **don't put newlines in the middle of fields**. – Marc Gravell Mar 26 '13 at 08:40
  • Is there any possibility that you could read the data as fixed width? The dataset you display appears like it might be an option. – jordanhill123 Mar 26 '13 at 08:45
  • You could append the next line if the previous line does not reach expected length. Not elegant but it all depends on how the file is created.... – jordanhill123 Mar 26 '13 at 08:51
  • @MarcGravell yea... I guess you could say that it is just some random text. I guess including quotes in the data is the best way to approach. I'm not sure why the users are putting newlines on some fields and some don't. Some other files have newlines cause of addresses. – Digital Mar 26 '13 at 08:53

4 Answers4

3

Use a library such as A Fast CSV Reader which supports all the features you need.

Giorgi
  • 30,270
  • 13
  • 89
  • 125
2

A List(of T) could be accessed also by index, you could add a lineCounter to your loop and if the line is composed of just one part after splitting, add the content to the previous list element. (At least the first line should be of 4 elements)

lineCounter = 0;
while (!reader.EndOfStream)
{
     var line = reader.ReadLine();
     var values = line.Split('|');

     if(values.Length == 1)
     {
        list4[lineCounter-1] += values[0];
     }
     else
     {
          list1.Add(values[0]);
          list2.Add(values[1]);
          list3.Add(values[2]);
          list4.Add(values[3]);
          lineCounter++;
     }

}

I have tested with sample data provided by the OP, it seems to work well.

Steve
  • 213,761
  • 22
  • 232
  • 286
  • This is the best answer in the question that I can see; it addresses the issue of the multi-line data in the form it occurs in the data. The pending question is: should the newline be retained or discarded in the concatenated data. Only the OP can answer that. – Marc Gravell Mar 26 '13 at 08:43
0

According to CSV file specification each record should be located on separate line (you can find CSV file spec here http://www.ietf.org/rfc/rfc4180.txt). So in your case you really need to make some sort of workaround and use other separator for marking line breaks.

z a
  • 1
  • See page 2, bullet #6 of that same specification; multiline is explicitly supported in CSV. Of course, we're not *really* processing formal CSV here - just delimited data. Quite a difference. – Marc Gravell Mar 26 '13 at 08:35
0

I've used FileHelpers Library for directly mapping to strong typed arrays. If you are working with formal CSV it will work for you.

If its just delimited data with no formal specifications, you might need some other solution.

jordanhill123
  • 4,142
  • 2
  • 31
  • 40
  • The data as shown in the question is clearly not standard CSV with escaped multilines; as such, the line that applies here is "you might need some other solution" :( – Marc Gravell Mar 26 '13 at 08:46
  • @MarcGravell Possibly fixed width is another option...`FileHelpers` can handle that http://stackoverflow.com/questions/162727/read-fixed-width-record-from-text-file – jordanhill123 Mar 26 '13 at 08:49