0

I am reading in a CSV file in the format:

10009279,D002158,,"Mount Robinson deposit",38.1139,-105.34557,NA,"United States",Colorado,Custer,B,"Aluminum, Potassium",,

I would like to write out a new CSV file based on only the selected columns from the initial CSV file, so the resulting structure would look like:

-105.34557,38.1139,"Mount Robinson deposit","Custer "Aluminum, Potassium"

I have tried:

StreamWriter writer = new StreamWriter(@textBox2.Text);
string[] lines = File.ReadAllLines(@textBox1.Text);
foreach (string line in lines)
{
    string[] fields = line.Split(',');
    writer.WriteLine(string.Format("{0},{1},{2},{3}", fields[LONcomboBox.SelectedIndex], fields[LATcomboBox.SelectedIndex], fields[NAMEcomboBox.SelectedIndex], fields[10 + 13]));
}
writer.Close();

Which works with the following problems:

  1. commas embedded in double quotes don't seem to be handled correctly
  2. concatonating multiple fileds seems to fail (this is resolved thanks for the answer)

any suggestions would be appreciated!

I have also attempted to use FileHelpers but can't seem to get it to work using the index of the column.

Fred
  • 1
  • 2
  • What do you mean in point A? Plus what is just the output for string.Format(...) line. Debug and see it in the watch window. – Zenwalker Aug 17 '11 at 03:44
  • in point A, When a field in the csv is like ,"field, two", I need the parser to disregard the comma between the sets of double quotes, seeing the embedded comma as part of the string instead of a field seperator. – Fred Aug 17 '11 at 05:28

5 Answers5

1

Ok.. short answer for something quick and dirty I would just use Linq and Regex.

    private void processCSV(string inputFileName)
    {
        Regex regexObj = new Regex(@"\s*(?:""(?<val>""[^""]*(""""[^""]*)*"")\s*|(?<val>[^,]*))(?:,|$)");
        List<List<Match>> elements = File.ReadAllLines(inputFileName)
            .Select<string,List<Match>>(x=>regexObj.Matches(x).Cast<Match>().ToList()).ToList();            
        List<string> newLines = elements.Select(y=>y.Select(z=>z.Groups["val"].Value).ToList())
                                        .Select(z=>string.Format("{0},{1},{2},{3}",z[0],z[1],z[2],z[4]+z[5]))
                                        .ToList();

         //Write newlines somewhere
    }

But for production level code, you really should use a CSV Parser library like http://www.codeproject.com/KB/database/CsvReader.aspx

I'm sure there are better ones out that was just the first I saw.

ExCodeCowboy
  • 870
  • 7
  • 10
0

Regarding (a), Do you have any control over the format of your csv input file? If so then an easy option would be to use a different delimiter to comma. Perhaps | instead?

To answer (b),

Instead of fields[10 + 13]

Try fields[10] + fields[13]

Otherwise all you are doing is attempting to find field 23, which probably doesn't exist.

Scott Ferguson
  • 7,690
  • 7
  • 41
  • 64
  • As per him, , are embedded b/w "" i wonder how that can happen in CSV unless , itself a value of on of the column – Zenwalker Aug 17 '11 at 04:12
  • I don't have any real control over the structure of the CSV file, so I have to work around the odd structure... Thanks for the point on how to concatonate the two fields. I will give it a shot. – Fred Aug 17 '11 at 04:39
  • Good luck. If the answers are helpful, don't forget to upvote them. :) – Scott Ferguson Aug 17 '11 at 04:43
0

You probably want to look for a csv parser, there's some good suggestions here: High scoring answer on StackOverflow

Don't you got an exception with that last parameter?

I think you want something more like:

writer.WriteLine(string.Format("{0},{1},{2},{3}", fields[LONcomboBox.SelectedIndex], fields[LATcomboBox.SelectedIndex], fields[NAMEcomboBox.SelectedIndex], fields[10] + fields[13]));

However I would probably be furious with anyone who didn't check the array indexes so be sure you validate that there are at least 14 columns before referencing fields[13].

Community
  • 1
  • 1
Doug Boone
  • 365
  • 1
  • 12
  • Thank everyone for the answers so far, I will try them and respond once I have had a chance to try them out. – Fred Aug 17 '11 at 04:37
  • I ended up using the csv parser from code project, after trying FileHelpers then attempting to implment the code from Davendra above. CSV parser from Code Project [A Fast CSV Reader](http://www.codeproject.com/KB/database/CsvReader.aspx) works great and allows me to select the field index where FileHelpers did not (or at least I couldn't figure out how to do so). – Fred Aug 19 '11 at 15:23
0

The following code implements the parser as well as the csv writer,

private void ProcessCSV(string sourceCsvFilePath, string destCsvFilePath)
{
    // Read contents of source file
    var lines = File.ReadAllLines(sourceCsvFilePath, Encoding.Default);

    // Process the old file contents
    var table = new List<List<string>>();
    foreach (var line in lines)
    {
        var cells = new List<string>();
        if (line[0] == ',')
        {
            cells.Add(string.Empty);
        }

        for (int i = 0; i < line.Length; i++)
        {
            if (line[i] == '\"')
            {
                var cellBuilder = new StringBuilder(line[i].ToString());
                i++;
                while (i < line.Length && line[i] != '\"')
                {
                    cellBuilder.Append(line[i].ToString());
                    i++;
                }
                cells.Add(cellBuilder.ToString().Trim('\"'));
            }
            else if (line[i] != ',')
            {
                var cellBuilder = new StringBuilder(line[i].ToString());
                i++;
                while (i < line.Length && line[i] != ',')
                {
                    cellBuilder.Append(line[i].ToString());
                    i++;
                }

                cells.Add(cellBuilder.ToString().Trim('\"'));
            }
            else if ( i > 0 && line[i - 1] == ',' && line[i] == ',')
            {
                cells.Add(string.Empty);
            }
        }

        if(line[line.Length - 1] == ',')
        {
            cells.Add(string.Empty);
        }

        table.Add(cells);
    }

    // Create a new table in the order: OldTable.Col2, OldTable.Col4, OldTable.Col0, "OldTable.Col1 OldTable.Col5 OldTable.Col6"
    var newTable = new List<List<string>>();

    foreach (var row in table)
    {
        var cells = new List<string>();
        cells.Add(row[2].Contains(',') ? string.Concat("\"", row[2], "\"") : row[2]);
        cells.Add(row[4].Contains(',') ? string.Concat("\"", row[4], "\"") : row[2]);
        cells.Add(row[0].Contains(',') ? string.Concat("\"", row[0], "\"") : row[2]);

        string str = string.Format("{0} {1} {2}", row[1], row[5], row[6]);
        cells.Add(str.Contains(',') ? string.Concat("\"", str, "\"") : str);

        newTable.Add(cells);
    }

    // Prepare the file contents
    var linesToWrite = new string[newTable.Count];
    int lineCounter = 0;
    foreach (var row in newTable)
    {
        StringBuilder rowBuilder = new StringBuilder();
        foreach (var cell in row)
        {
            rowBuilder.AppendFormat("{0},", cell);
        }

        linesToWrite[lineCounter++] = rowBuilder.ToString().Trim(',');
    }

    // Write the contents to CSV
    File.WriteAllLines(destCsvFilePath, linesToWrite, Encoding.Default);
}

You will need to add the checks for File.Exists, and index checks (to ensure that the column exists before you access it). If you interested in a more concise approach, then you can try parsing using Regular expressions with the Regex class.

Devendra D. Chavan
  • 8,871
  • 4
  • 31
  • 35
  • So far this seems to be working fairly well -- changed the cells.Add ---> row [2]); at the end needs to be changed with each row. I tried: cells.Add(row[LONcomboBox.SelectedIndex].Contains(',') ? string.Concat("\"", row[LONcomboBox.SelectedIndex], "\"") : row[LONcomboBox.SelectedIndex]); This way I can use the selected value from the combobox. The real data inbound from the csv file is: 10009279,D002158,,"Mount Robinson deposit",38.1139,-105.34557,NA,"United States",Colorado,Custer,B,"Aluminum, Potassium",, If you notice it has empty fields, how do I handle those? – Fred Aug 17 '11 at 05:22
  • sorry, that was quite messy, I edited the original question to display some real data I am parsing, including the empty fields. – Fred Aug 17 '11 at 05:34
  • I have updated the method, it now handles empty cells. You may have to update column indices according to the values you want to see in the output. – Devendra D. Chavan Aug 17 '11 at 13:55
  • I ended up using a parser from Code Project to do this, but your code would have worked except for the emtpy cells. I haven't tested the edited version, but would suppose it would work. – Fred Aug 19 '11 at 15:27
  • Nice to hear that you got the solution. – Devendra D. Chavan Aug 19 '11 at 15:35
0

Based on the recommendation from Doug I reviewed some csv parsers. I first tried FileHelpers however I really couldn't get that to work while using a header line. The references to the columns always ended up being by column name, and not index. I really needed to reference my columns by the index of the column since that matches my selection both a listbox and checkedlistbox. I reviewed A FAST CSV READER which actually did the trick for me. Here is how I implemnted it:

    private void button2_Click(object sender, EventArgs e)
    {

        using (CsvReader csv = new CsvReader(new StreamReader(@textBox1.Text), true))
        {
            int fieldCount = csv.FieldCount;
            string[] headers = csv.GetFieldHeaders();

            /// Evaluate the checkedlistbox
            string comment = "";
            List<Int32> comment_indices = new List<Int32>();
            List<String> lines = new List<String>();
            for (int x = 0; x <= checkedListBox1.CheckedItems.Count - 1; x++)
            {
                // add selected item's index to list
                comment_indices.Add(checkedListBox1.CheckedIndices[x]);
            }

            while (csv.ReadNextRecord())
            {

                ///  Use the SelectedIndex to match the header and column
                string base_string = csv[LONcomboBox.SelectedIndex] + "," + csv[LATcomboBox.SelectedIndex] + "," + csv[NAMEcomboBox.SelectedIndex] + ",";
                //MessageBox.Show(base_string);

                ///  Try to get the row value -- this is the row count - starting at 0 excluding headers I think
                //MessageBox.Show("Is this the row count?" + csv.CurrentRecordIndex);
                comment = "";
                ///  Get the comment
                foreach (Int32 indices in comment_indices)
                {
                    comment = comment + csv[indices] + " ";
                }
                //MessageBox.Show(base_string + '"' + comment + '"');
                string completed_string = base_string + '"' + comment + '"';
                lines.Add(completed_string);
            }

            StreamWriter writer = new StreamWriter(@textBox2.Text);
            foreach (string line in lines)
            {
                writer.WriteLine(line);
            }
            writer.Close(); 
        }          
        MessageBox.Show(" Finished Writing file " + "\n" + "\n" + " " + textBox2.Text);
    }

There are good solid examples on the Code Project site, and it solved my problem. Hope this helps someone else!

Fred
  • 1
  • 2