0

I am trying to parse a file using regex split, it works well with the '\t' character but some lines have the '\t' inside a field instead of acting as the delimiter.

Like :

G2226   TEST 1  C   29  Internal Head Office    D   Head Office ZZZ Unassigned  10910   10/10/2011  11/10/2011  10/10/2011  11/10/2011  "Test call  Sort the customer out some data. See the customer again tomorrow to talk about Prod     "   Mr ABC          Mr ABC                  Mr  ABC Mr  ABC Credit Requested    BDM Call    Internal Note   10

This part has 2 tabs I wish were ignored :

"Test call  Sort the customer out some data. See the customer again tomorrow to talk about Prod\t\t"

The good thing is, they are included in double quotes, but I cannot work out how to ignore them, any ideas?

Edit:

My goal is to get 36 columns, some columns may come out more after a Regex.Split(lineString,'\t') using '\t' because they include '\t' characters inside some of the fields. I would like to ignore those ones. The one above comes out to 38 cols, which is rejected by my datatable as the header is only 36 cols, I would like to solve this problem.

sprocket12
  • 5,368
  • 18
  • 64
  • 133
  • What are you trying to get as output? And what is your current regex? When you say "ignore" - what do you mean? – Oded Feb 15 '13 at 12:31
  • Rather than using a regex, why not use a library to parse a CSV file? This question has several freely available options for c#: http://stackoverflow.com/questions/1375410/very-simple-c-sharp-csv-reader –  Feb 15 '13 at 12:32
  • Once you have to deal with fields that are surrounded by quotes, a simple regex split approach no longer works. –  Feb 15 '13 at 12:34
  • @Oded made clarification. dan1111, I tried the lib it crashed on the file then thought that it was so easy why use the lib anyway. – sprocket12 Feb 15 '13 at 12:35
  • 2
    @MuhammadA, if you have a simple CSV file, I agree that it is easier to not use a library. But once you have things like quoted fields and separators or newlines within the fields, I think you really need to use a library, and it will be worth the time figuring out how to get one working. –  Feb 15 '13 at 12:38
  • @dan1111 Got it working with a code snippet from your link after a bit of modification. :) Maybe you can make your comments an answer? – sprocket12 Feb 15 '13 at 12:43
  • @MuhammadA, I made an answer. I'm glad you got it working. –  Feb 15 '13 at 12:47

3 Answers3

0

Regex is not the right tool for this.

You have basically a CSV format, it is "tab separated", not "comma separated", but it works exactly the same. So, find a CSV parser and use that - the separation character is usually configurable.

Community
  • 1
  • 1
Fabian Schmengler
  • 24,155
  • 9
  • 79
  • 111
0

If you have a simple CSV file, then regex split is usually the easiest way to process it.

However, if your CSV file contains more complex elements, such as quoted fields that contain separator characters or newlines, then this approach will no longer work. It is not a trivial matter to correctly parse these types of files, so you should use a library when possible.

The answers to this question give several options for C# libraries that can read a CSV file.

Community
  • 1
  • 1
0

If you really need a regular expression, you can try something like this:

(?!\t")\t(?!\t")
jerone
  • 16,206
  • 4
  • 39
  • 57