I want to import data from a CSV file, But some cells contain comma in string value. How can I recognize which comma is for separate and which is in cell content?
-
2Duplicate - http://stackoverflow.com/questions/769621/dealing-with-commas-in-a-csv-file – Eric LaForce Nov 09 '11 at 01:21
5 Answers
use TextFieldParser :usage
using Microsoft.VisualBasic.FileIO; //Microsoft.VisualBasic.dll
...
using(var csvReader = new TextFieldParser(reader)){
csvReader.SetDelimiters(new string[] {","});
csvReader.HasFieldsEnclosedInQuotes = true;
fields = csvReader.ReadFields();
}

- 39,699
- 7
- 33
- 70
In general, do not bother writing the import yourself.
I have good experiences with the FileHelpers lib. http://www.filehelpers.com/
And indeed, I hope your fields are quoted. Filehelpers supports this out of the box.
Otherwise there is not much you can do.

- 8,856
- 2
- 30
- 50
using FileHelper is defnitley way to go. They have done a great job building all the logic for you. I had the same issue where i had to parse a CSV file having comma as part of the field. And this utility did the job very well. All you have to do is to use fillowing attribute on to the field
[FieldQuoted('"', QuoteMode.OptionalForBoth)]
For details http://www.filehelpers.com/forums/viewtopic.php?f=12&t=391

- 3,254
- 5
- 40
- 63
We can use RegEx also as bellow.
Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] Fields = CSVParser.Split(Test);

- 1,541
- 10
- 14
Unless you have quotes around the strings you are pretty much hosed, hence the "quote and comma" delimiter style. If you have control of the export facility then you must select "enclose strings quotes" or change the delimiter to something like a tilde or carat symbol.
If not well then you have to write some code. If you detect "a..z" then start counting commas and then keep working through string until you detect [0..9] and even then this is going to be problematic since people can put a [0..9] in their text. At best this is going to be a best efforts process. Your going to have to know when you are in chars and when you are not going to be in chars. I doubt even regex will help you much on this.
The only other thing I can think of is to run through the data and look for commas. Then look prior to and after the comma. If you are surrounded by chars then replace the comma with alternate char like the carat "^" symbol or the tilde "~". Then process the file as normal then go back and replace the alternate char with a comma.
Good luck.

- 333
- 1
- 9
-
What about some string values that contains "^" or "~" character? – Mehdi Hadjar Nov 09 '11 at 02:15
-
No solution here is going to be perfect. I have found that a tilde is pretty rare in the western char set as far as written english goes with the comma far far more prevalent. Now then my solution could be problematic depending on the triplet indicator, but what is to be done but work through the data given. – FlyingGuy Nov 09 '11 at 03:56