0

For example, there is a line:

name, tax, company.

To separate them i need a split method.

  string[] text = File.ReadAllLines("file.csv", Encoding.Default);
   foreach (string line in text)
    {
     string[] words = line.Split(',');
     foreach (string word in words)
      {
       Console.WriteLine(word);
      }
     }
   Console.ReadKey();

But how to divide if in quotes the text with a comma is indicated:

name, tax, "company, Ariel";<br>
"name, surname", tax, company;<br>  and so on.

To make it like this :

  Max | 12.3 | company, Ariel
Alex, Smith| 13.1 | Oriflame

It is necessary to take into account that the input data will not always be in an ideal format (as in the example). That is, there may be 3 quotes in a row or a string without commas. The program should not fall in any case. If it is impossible to parse, then issue a message about it.

vamp123
  • 57
  • 6
  • Hi, if you want to write your own parser (there is a csv parser somewhere in the .net framework), you cannot use the split, because of the problem you stated. instead of this, you have to check each character for a delimiter character, and take care of if you are within an open " or not. Not really tricky. – nabuchodonossor Jan 16 '19 at 12:46
  • 1
    It is in Microsoft.VisualBasic.FileIO.TextFieldParser (you can use it from C# ...) – nabuchodonossor Jan 16 '19 at 12:49
  • 1
    I would recommend not trying to write your own parser, there are decent ones out there like CSVHelper which will save you headaches – NDJ Jan 16 '19 at 12:49
  • @NDJ: Not beeing aware of the Framework Field Parser, I wrote one day my own parser ... was really easy, no problem. No headaches, and more flexible than a "standard" tool (at least when I had some ... strange ideas ... of how the csv could also be formatted) – nabuchodonossor Jan 16 '19 at 12:51
  • Don't use `Microsoft.VisualBasic.FileIO.TextFieldParser` either. It offers very basic functionality and isn't available in .NET Core. CSV files are supposed to be simple but often aren't. A text field may contain newlines. There may be multiple header or *footer* lines. – Panagiotis Kanavos Jan 16 '19 at 12:51
  • 2
    Possible duplicate of [How can I Split(',') a string while ignore commas in between quotes?](https://stackoverflow.com/questions/21342949/how-can-i-split-a-string-while-ignore-commas-in-between-quotes) – Guilherme de Jesus Santos Jan 16 '19 at 12:52
  • @nabuchodonossor did you try it with real files? Files with multiple headers for example, or footers? Newlines in the text fields? All those are things that can appear in a CSV file – Panagiotis Kanavos Jan 16 '19 at 12:52
  • @PanagiotisKanavos: I used it for production code ... The csv files where produced by another program ... no extra headers, and newlines encoded with escape chars (like in a c string ...) – nabuchodonossor Jan 16 '19 at 12:53
  • @nabuchodonossor which is why you didn't encounter problems with a simple implementation. All of those things *do* appear in "csv" files, which is why there are libraries and skip line options. – Panagiotis Kanavos Jan 16 '19 at 12:54
  • Use a CSV library. https://joshclose.github.io/CsvHelper/ – Fred Jan 16 '19 at 12:55
  • @PanagiotisKanavos wich is you have the "wrong" customers: When they ask for a CSV, I tell them if their files are garbage or not. But sure, it´s much easier to use existing tools. On the other side: If this is something like "homework", it´s much better to implement it one time to understand how to do it AND THEN use a tool ... – nabuchodonossor Jan 16 '19 at 12:57
  • @nabuchodonossor Airlines. Banks, Factories. Credit Card payments. The [RFC 4180](https://tools.ietf.org/html/rfc4180#page-2) used as a "specification" says : `Due to lack of a single specification, there are considerable differences among implementations. Implementors should "be conservative in what you do, be liberal in what you accept from others`. And `While numerous private specifications exist for various programs and systems, there is no single "master" specification for this format.` – Panagiotis Kanavos Jan 16 '19 at 13:09
  • @PanagiotisKanavos: Thank you for this information, but again: If the customer provides garbage, I tell him .... and bill him more for extra coding. – nabuchodonossor Jan 16 '19 at 13:20

3 Answers3

0

Split using double quotes first. And Split using comma on the first string.

subject-q
  • 91
  • 3
  • 19
  • This will not work, if you have ALSO " within a string escaped as "" or other kind of escape char (like \" ...) – nabuchodonossor Jan 16 '19 at 12:54
  • Agree for a global parser it would not always work. But it doesn't look like that is a potential fail in that data. As far as I assume from the sample data. – subject-q Jan 16 '19 at 12:56
0

You can use TextFieldParser from Microsoft.VisualBasic.FileIO

var list = new List<Data>();
var isHeader=true;
using (TextFieldParser parser = new TextFieldParser(filePath))
{

        parser.Delimiters = new string[] { "," };
        while (true)
        {
            string[] parts = parser.ReadFields();
            if(isHeader)
            {
                isHeader = false; 
                continue;
            }
            if (parts == null)
                break;

            list.Add(new Data
                {
                    People = parts[0],
                    Tax = Double.Parse(parts[1]),
                    Company = parts[2]
                });

        }
 }

Where Data is defined as

public class Data
{
    public string People{get;set;}
    public double Tax{get;set;}
    public string Company{get;set;}
}

Please note you need to include Microsoft.VisualBasic.FileIO

Example Data,

Name,Tax,Company
Max,12.3,"company, Ariel"
Ariel,13.1,"company, Oriflame"

Output

enter image description here

Anu Viswan
  • 17,797
  • 2
  • 22
  • 51
  • Thanks, but I cannot use third-party libraries for parsing CSV, or Microsoft.VisualBasic.FileIO this library not a third-party? – vamp123 Jan 16 '19 at 13:34
  • @vamp123 That's from microsoft i suppose. You can find more details here. https://learn.microsoft.com/en-us/dotnet/api/microsoft.visualbasic.fileio.textfieldparser?view=netframework-4.7.2 – Anu Viswan Jan 16 '19 at 13:35
0

Here's a bit of code that might help, not the most efficient but I use it to 'see' what is going on with the parsing if a particular line is giving trouble.

string[] text = File.ReadAllLines("file.csv", Encoding.Default);
string[] datArr;
string tmpStr;
foreach (string line in text)
{
  ParseString(line, ",", "!@@@@!", out datArr, out tmpStr)
  foreach(string s in datArr)
  {
    Console.WriteLine(s);
  }
}
Console.ReadKey();

private static void ParseString(string inputString, string origDelim, string newDelim, out string[] retArr, out string retStr)
{
    string tmpStr = inputString;
    retArr = new[] {""};
    retStr = "";

    if (!string.IsNullOrWhiteSpace(tmpStr))
    {
        //If there is only one Quote character in the line, ignore/remove it:
        if (tmpStr.Count(f => f == '"') == 1)
            tmpStr = tmpStr.Replace("\"", "");

        string[] tmpArr = tmpStr.Split(new[] {origDelim}, StringSplitOptions.None);
        var inQuote = 0;

        StringBuilder lineToWrite = new StringBuilder();
        foreach (var s in tmpArr)
        {
            if (s.Contains("\""))
                inQuote++;

            switch (inQuote)
            {
                case 1:
                    //Begin quoted text
                    lineToWrite.Append(lineToWrite.Length > 0
                        ? newDelim + s.Replace("\"", "")
                        : s.Replace("\"", ""));

                    if (s.Length > 4 && s.Substring(0, 2) == "\"\"" && s.Substring(s.Length - 2, 2) != "\"\"")
                    {
                        //if string has two quotes at the beginning and is > 4 characters and the last two characters are NOT quotes,
                        //inquote needs to be incremented.
                        inQuote++;
                    }
                    else if ((s.Substring(0, 1) == "\"" && s.Substring(s.Length - 1, 1) == "\"" &&
                              s.Length > 1) || (s.Count(x => x == '\"') % 2 == 0))
                    {
                        //if string has more than one character and both begins and ends with a quote, then it's ok and counter should be reset.
                        //if string has an EVEN number of quotes, it should be ok and counter should be reset.
                        inQuote = 0;
                    }
                    else
                    {
                        inQuote++;
                    }

                    break;
                case 2:
                    //text between the quotes
                    //If we are here the origDelim value was found between the quotes
                    //include origDelim so there is no data loss.
                    //Example quoted text: "Dr. Mario, Sr, MD";
                    //      ", Sr" would be handled here
                    //      ", MD" would be handled in case 3 end of quoted text.
                    lineToWrite.Append(origDelim + s);
                    break;
                case 3:
                    //End quoted text
                    //If we are here the origDelim value was found between the quotes
                    //and we are at the end of the quoted text
                    //include origDelim so there is no data loss.
                    //Example quoted text: "Dr. Mario, MD"
                    //      ", MD" would be handled here.
                    lineToWrite.Append(origDelim + s.Replace("\"", ""));
                    inQuote = 0;
                    break;
                default:
                    lineToWrite.Append(lineToWrite.Length > 0 ? newDelim + s : s);
                    break;

            }

        }

        if (lineToWrite.Length > 0)
        {
                retStr = lineToWrite.ToString();
                retArr = tmpLn.Split(new[] {newDelim}, StringSplitOptions.None);

        }

    }
}
Andrew
  • 59
  • 5