44

I have to create a CSV file from webservice output and the CSV file uses quoted strings with comma separator. I cannot change the format...

So if I have a string it becomes a "string"... If the value has quotes already they are replaced with double quotes. For example a str"ing becomes "str""ing"...

However, lately my import has been failing because of the following

  • original input string is: "","word1,word2,..."
  • every single quote is replaced by double resulting in: """",""word1,word2,...""
  • then its prefixed and suffixed with quote before written to CVS file: """"",""word1,word2,..."""

As you can see the final result is this:

""""",""word1,word2,..."""

which breaks my import (is sees it as another field)... I think the issue is appereance of "," in the original input string.

Is there a CVS escape sequence for this scenario?

Update

The reason why above breaks is due to BCP mapping file (BCP utility is used to load CSV file into SQL db) which has terminator defined as "," . So instead of seeing 1 field it sees 2...But I cannot change the mapping file...

zam6ak
  • 7,229
  • 11
  • 46
  • 84
  • Different applications tend to have different CSV specifications. What application are you importing in? It might be worth checking what it outputs for your specific usecase. – Martijn Jun 16 '11 at 19:34
  • 3
    If `"","word1,word2,..."` is the value then `""""",""word1,word2,..."""` is the escaped CSV field. What exactly is breaking? – dtb Jun 16 '11 at 19:35
  • @dtb the process that imports the file uses BCP utility to import it into SQL datbase...the terminator is declared as '","' which interprets above as 2 fields instead of 1 – zam6ak Jun 16 '11 at 19:50
  • 3
    So that utility is broken. That's awesome! ...not. Workaround: Replace all `"` with something else, import, perform SQL query to replace something else back to `"`. – dtb Jun 16 '11 at 19:55
  • I am not sure BCP works with CSV input. If it purports to, it isn't reading it correctly. – Ed Bayiates Jun 16 '11 at 19:55
  • @dtb I think that may be my only option... – zam6ak Jun 16 '11 at 20:00

6 Answers6

126

I use this code and it has always worked:

/// <summary>
/// Turn a string into a CSV cell output
/// </summary>
/// <param name="str">String to output</param>
/// <returns>The CSV cell formatted string</returns>
public static string StringToCSVCell(string str)
{
    bool mustQuote = (str.Contains(",") || str.Contains("\"") || str.Contains("\r") || str.Contains("\n"));
    if (mustQuote)
    {
        StringBuilder sb = new StringBuilder();
        sb.Append("\"");
        foreach (char nextChar in str)
        {
            sb.Append(nextChar);
            if (nextChar == '"')
                sb.Append("\"");
        }
        sb.Append("\"");
        return sb.ToString();
    }

    return str;
}
Ed Bayiates
  • 11,060
  • 4
  • 43
  • 62
  • 1
    I've read CSV from the above code into Excel without issues. There may be a particular CSV reader that is broken if my code gets the same output as yours. – Ed Bayiates Jun 16 '11 at 19:51
  • I get the same output: """"",""word1,word2,...""" from my code and it reads into Excel as "","word1,word2,..." as expected. If your code produces that output, your code is correct already. – Ed Bayiates Jun 16 '11 at 19:54
  • I believe my problem is the import utility terminator (as explained in question update), so I was wondering if there is a way to add additional escape sequence to make it work – zam6ak Jun 16 '11 at 19:56
  • sweet! Was looking for the exact same thing. – Bohn Oct 20 '15 at 13:03
  • 1
    This throws an exception when `str == null` – kavun Nov 07 '16 at 21:02
  • 4
    @kavun yes, that's obvious, and done purposely. Null checks are often inappropriate in functions that are called many times. If you adopt this code then how you deal with bad data is up to you. – Ed Bayiates Nov 10 '16 at 18:28
  • C# extension method version: `public static string ToCSVCell(this string str)` – Kugan Kumar Jul 18 '19 at 09:47
18

Based on Ed Bayiates' answer:

/// <summary>
/// Turn a string into a CSV cell output
/// </summary>
/// <param name="value">String to output</param>
/// <returns>The CSV cell formatted string</returns>
private string ConvertToCsvCell(string value)
{
    var mustQuote = value.Any(x => x == ',' || x == '\"' || x == '\r' || x == '\n');

    if (!mustQuote)
    {
        return value;
    }

    value = value.Replace("\"", "\"\"");

    return string.Format("\"{0}\"", value);
}
Lenin
  • 679
  • 2
  • 10
  • 15
  • Easiest solution with simple replace. – zdarsky.peter Mar 30 '17 at 15:42
  • 6
    But computationally much, much more expensive! There is a good reason Ed's code is using StringBuilder. Performance will be orders of magnitude worse for large CSV files! Easy is relative. If you like spending hours tracing down performance bottlenecks in your code, copy this snippet. If not, use Ed's code. – Stijn de Witt Sep 21 '17 at 20:00
  • 1
    Use `Regex.IsMatch(value, "[,\"\\r\\n]");` for the `mustQuote` expression. Faster. – Sellorio Sep 23 '17 at 07:52
  • 1
    Why would this be computationally much more expensive? Both functions work at the Cell level, rather than on the whole file, and iterating across every character in a string and adding to a `StringBuilder` is surely not more expensive than a single `.Replace`, as `Replace` uses a `Span` internally, as does `StringBuilder`, so in both cases, we take a string, convert it to a `Span` of `Char` then in one we continually add to that `Span`, in the other the final `Span` size is pre-calculated. If this were processing an whole file, agreed, a `StringBuilder` is far better than a looped `Replace`. – Andy Wynn Feb 08 '23 at 11:00
  • Proposed one liner: `private string ConvertToCsvCell(string value) => Regex.IsMatch(value, "[,\"\\r\\n]") ? $"\"{value.Replace("\"", "\"\"")}\"" : value;` – Andy Wynn Feb 08 '23 at 11:03
  • Alternatively, if you don't like regex but do like pattern matching: `private string ConvertToCsvCell(string value) => value.Any(x=> x is ',' or '"' or '\r' or '\n') ? $"\"{value.Replace("\"", "\"\"")}\"" : value;` – Andy Wynn Feb 08 '23 at 11:06
4

My penny thought:

String[] lines = new String[] { "\"\",\"word\",word,word2,1,34,5,2,\"details\"" };
for (int j = 0; j < lines.Length; j++)
{
    String[] fields=lines[j].Split(',');
    for (int i =0; i<fields.Length; i++)
    {
        if (fields[i].StartsWith("\"") && fields[i].EndsWith("\""))
        {
            char[] tmp = new char[fields[i].Length-2];
            fields[i].CopyTo(1,tmp,0,fields[i].Length-2);
            fields[i] =tmp.ToString();
            fields[i] = "\""+fields[i].Replace("\"","\"\"")+"\"";
        }
        else
            fields[i] = fields[i].Replace("\"","\"\"");

    }
    lines[j]=String.Join(",",fields);

}

BugFinder
  • 17,474
  • 4
  • 36
  • 51
3

Based on contribution of "Ed Bayiates" here's an helpful class to buid csv document:

/// <summary>
/// helpful class to build csv document
/// </summary>
public class CsvBuilder
{
    /// <summary>
    /// create the csv builder
    /// </summary>
    public CsvBuilder(char csvSeparator)
    {
        m_csvSeparator = csvSeparator;
    }

    /// <summary>
    /// append a cell
    /// </summary>
    public void appendCell(string strCellValue)
    {
        if (m_nCurrentColumnIndex > 0) m_strBuilder.Append(m_csvSeparator);

        bool mustQuote = (strCellValue.Contains(m_csvSeparator)
                        || strCellValue.Contains('\"') 
                        || strCellValue.Contains('\r') 
                        || strCellValue.Contains('\n'));

        if (mustQuote)
        {
            m_strBuilder.Append('\"');
            foreach (char nextChar in strCellValue)
            {
                m_strBuilder.Append(nextChar);
                if (nextChar == '"') m_strBuilder.Append('\"');
            }
            m_strBuilder.Append('\"');
        }
        else
        {
            m_strBuilder.Append(strCellValue);
        }
        m_nCurrentColumnIndex++;
    }

    /// <summary>
    /// end of line, new line
    /// </summary>
    public void appendNewLine()
    {
        m_strBuilder.Append(Environment.NewLine);
        m_nCurrentColumnIndex = 0;
    }

    /// <summary>
    /// Create the CSV file
    /// </summary>
    /// <param name="path"></param>
    public void save(string path )
    {
        File.WriteAllText(path, ToString());
    }

    public override string ToString()
    {
        return m_strBuilder.ToString();
    }

    private StringBuilder m_strBuilder = new StringBuilder();
    private char m_csvSeparator;
    private int m_nCurrentColumnIndex = 0;

}

How to use it:

void exportAsCsv( string strFileName )
{
    CsvBuilder csvStringBuilder = new CsvBuilder(';');
    csvStringBuilder.appendCell("#Header col 1 : Name");
    csvStringBuilder.appendCell("col 2 : Value");
    csvStringBuilder.appendNewLine();
    foreach (Data data in m_dataSet)
    {
        csvStringBuilder.appendCell(data.getName());
        csvStringBuilder.appendCell(data.getValue());
        csvStringBuilder.appendNewLine();
    }
    csvStringBuilder.save(strFileName);
}
Patrice I
  • 41
  • 3
1

the first step in parsing this is removing the extra added " 's around your string. Once you do this, you should be able to deal with the embedded " as well as the ,'s.

Mike
  • 3,462
  • 22
  • 25
  • The original string has to appear in db as is - I wish I could just remove empty quotes from it... – zam6ak Jun 16 '11 at 19:57
0

After much deliberation, it was decided that import utility format was needed to be fixed. The escaping of the string was correct (as users indicated) but the format file that import utility used was incorrect and was causing it to break import.

Thanks all and special thanks to @dbt (up vote)

zam6ak
  • 7,229
  • 11
  • 46
  • 84