1

How would you split this row to a string array?

the problem is Rutois, a.s. , so you cannot directly split with ',' separator..

543472,"36743721","Rutois, a.s.","151","some name","01341",55,"112",1

thanks

theSpyCry
  • 12,073
  • 28
  • 96
  • 152

6 Answers6

7

I would recommend you using a CSV parser instead of rolling your own.

FileHelpers is a nice library for this job.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • Its actually a pretty simple finite state machine to make. I wrote one a few years back because ADO felt like an overkill for this task. – Ilia G Aug 21 '10 at 15:24
  • 2
    +1. http://www.codeproject.com/KB/database/CsvReader.aspx is a nice lightweight one. @liho1eye: it may be simple, but reinventing the wheel is not creating value for your customer. – TrueWill Aug 21 '10 at 15:45
  • @TrueWill thats kinda circular logic. Besides I just looked at your link and that seems almost identical to my solution... possibly much better polished, but (looking at revision log) younger than mine by at least 2 years. Not that I am trying to claim rights to this implementation. Its just goes to prove that CSV parser is pretty easy to make. – Ilia G Aug 21 '10 at 16:02
  • @liho1eye: If you have an alternate implementation, great! Post it as open source (if you own the copyright) and let others benefit from your efforts. What I'm saying is that there is no reason for anyone else to write a CSV parser other than (a) as a learning exercise, (b) none is available on their platform, or (c) existing parsers do not meet their business requirements. – TrueWill Aug 21 '10 at 19:52
6

You can use a regular expression to pick out the values from the line:

string line ="543472,\"36743721\",\"Rutois, a.s.\",\"151\",\"some name\",\"01341\",55,\"112\",1";
var values = Regex.Matches(line, "(?:\"(?<m>[^\"]*)\")|(?<m>[^,]+)");
foreach (Match value in values) {
  Console.WriteLine(value.Groups["m"].Value);
}

Output:

543472
36743721
Rutois, a.s.
151
some name
01341
55
112
1

This of course assumes that you actually have got the complete CSV record in the string. Note that values in a CSV record can contain line breaks, so getting the records from a CSV file can not be done by simply splitting it on line breaks.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • I've checked your regex and it fails in this case: "text1,\"text2,\"text3" It should return values: text1 | "text2 | "text3 but it returns: text1 | text2, | text3 – Bronek Jun 20 '13 at 09:24
  • @Bronek: Why do you think that it should do that? – Guffa Jun 21 '13 at 07:28
  • ...'cause it should not divide this data after the second quotation mark. In this case comma is assumed as delimiter. – Bronek Jun 21 '13 at 11:04
  • @Bronek: That input is invalid, so the expected result isn't defined. – Guffa Jun 22 '13 at 07:52
1

you can connect to file using odbc check this

link (If link does not help much just google it "connecting csv files with odbc")

If you have problems in odbc also i guess the file is not a valid csv file.

mehmet6parmak
  • 4,777
  • 16
  • 50
  • 71
  • That could be great.. the problem is that the csv file is not valid ! .. thanks to the file creator.. – theSpyCry Aug 21 '10 at 14:54
  • 1
    @PaN1C_Showt1Me: The example CSV is valid. The field containing a comma is enclosed in double-quotes. See http://en.wikipedia.org/wiki/Comma-separated_values – TrueWill Aug 21 '10 at 15:48
1

I'd be tempted to swap out the quotes that occur inside the quoted strings and then use split. this would work.

        string csv = "543472,\"36743721\",\"Rutois, a.s.\",\"151\",\"some name\",\"01341\",55,\"112\",1"; 


        const string COMMA_TOKEN = "[COMMA]";
        string[] values;
        bool inQuotes = false;

        StringBuilder cleanedCsv = new StringBuilder();
        foreach (char c in csv)
        {
            if (c == '\"')
                inQuotes = !inQuotes;  //work out if inside a quoted string or not
            else
            {
                //Replace commas in quotes with a token
                if (inQuotes && c == ',')
                    cleanedCsv.Append(COMMA_TOKEN);
                else
                    cleanedCsv.Append(c);
            }
        }

        values = cleanedCsv.ToString().Split(',');

        //Put the commas back
        for (int i = 0; i < values.Length; i++)
            values[i] = values[i].Replace(COMMA_TOKEN, ",");
user427261
  • 11
  • 1
  • I've checked your solution and it fails in this case: "text1,\"text2,\"text3" It should return values: text1 | "text2 | "text3 but it returns: text1 | text2,text3 – Bronek Jun 20 '13 at 09:34
  • The above code while not useful where non paired double quotes are used, it was very helpful to me, it allowed me to read over 30 text file with different file formats that sometimes had commas in double quotes. I was able to parse the fields correctly and then create excel files using EPPlus. One thing I had to add, was to remove an equal sign at the beginning of fields when the column contained leading zeros in numeric columns to preserve the extra zeros e.g. ="00003322". – code-it Oct 07 '21 at 16:05
0

The other RegEx answer will fail if the first character is a quote.

This is the correct regular expression:

string[] columns = Regex.Split(inputRow, ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
Ryan Loggerythm
  • 2,877
  • 3
  • 31
  • 39
0

I'm guess you want something like this -

string csv = 543472,"36743721","Rutois, a.s.","151","some name","01341",55,"112",1 ;
string[] values;
values = csv.Split(",");
for(int i = 0; i<values.Length; i++)
{
    values[i] = values[i].Replace("\"", "");
}

Hope this helps.

Ash Burlaczenko
  • 24,778
  • 15
  • 68
  • 99