1

Is there an easier way to turn a string into a CSV compatible value, eg. escaping, quoting as necessary.

Currently I have this:

public static object ToCsv(object obj, CultureInfo cultureInfo, string delimiter)
{
    CsvConfiguration config = new(cultureInfo) { Encoding = Encoding.UTF8, Delimiter = delimiter, HasHeaderRecord = false, ShouldQuote = (_) => true };

    using var memoryStream = new MemoryStream();
    using var streamWriter = new StreamWriter(memoryStream);
    using var csvWriter = new CsvWriter(streamWriter, config);
    csvWriter.WriteField(obj);
    csvWriter.Flush();

    memoryStream.Position = 0;
    return Encoding.UTF8.GetString(memoryStream.ToArray());
}

Not only does this code seem like an overkill, I am also concerend about the performance.

The code is used in a copy&paste event where a DataGrid fires an event for each individual cell, and I need to parse each individual cell, depending on the amount of rows/columns the user has selected this piece of code could be called thousand of times. (for each cell)

CsvWriter and CsvConfiguration are part of the CsvHelper library: https://github.com/JoshClose/CsvHelper

Rand Random
  • 7,300
  • 10
  • 40
  • 88
  • Do you have any issues, like an actual performance issue, with the code or are you just guessing that it might cause one? What time does this method need for one execution? – Tim Schmelter Oct 06 '22 at 12:44
  • @TimSchmelter - as an user I would say yes, because the amount of time that I have to wait for the copy to finish does feel to much | as an developer, because I know what is going on and know the reason for the slowness, I would say no – Rand Random Oct 06 '22 at 12:46
  • Have you had a look at this: https://stackoverflow.com/questions/6377454/escaping-tricky-string-to-csv-format – Magnus Oct 06 '22 at 12:48
  • @TimSchmelter - here performance data: `1.61` seconds for `163,110 calls` - without calling the method: `119` milliseconds – Rand Random Oct 06 '22 at 13:33
  • @Magnus - no didn't see the SO was focused on finding a solution with the `CsvHelper` library – Rand Random Oct 06 '22 at 13:34

2 Answers2

1

This method is NOT designed to be used to build each field in a possible CSV file cell by cell. It is a once off to CSV parse a single value. It is a good implementation of this as it uses the same logic that a full file approach would use, but it would be a very poor implementation to use to write a whole file, or to use recursively for many fields in a file.

For the readers at home, if you need to use this library to write a file please consult the documentation: Writing a CV file

using (var writer = new StreamWriter("path\\to\\file.csv"))
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
    csv.WriteRecords(records);
}

To adequately judge/assess performance, it would be worth showing your code that is calling this method. As with the code above, to build a file you would setup the streams and config once for the file, then use WriteRecords which will iterate over your objects in the list and internally call csvWriter.WriteField(value); for each property in those objects.

Side note: that method really should declare the returning type a string and not an object.

The code is used in a copy & paste event where a DataGrid fires an event for each individual cell, and I need to parse each individual cell, depending on the amount of rows/columns the user has selected this piece of code could be called thousand of times. (for each cell)

If performance is a problem, do not try to handle this on a cell by cell basis, of alternatively give your user an alternate way to paste in a larger set of data that you can parse into a CSV file and then programmaticallyy assign to the underlying data.

As you are using 3rd party libraries (Telerik and CsvHelper) it is worth consulting their forums for specific advice on how to intercept a paste event for a bulk paste without being forced to handle the cells individually.

That being said, we can improve the performance by taking some of the internals from CsvHelper, not that you have specified that all fields should be quoted with the ShouldQuote = (_) => true so we can simply to this:

public static string ToCsv(object obj, CultureInfo cultureInfo, string delimiter, string escapedDelimiter, bool alwaysQuote = true)
{
    var field = String.Format(cultureInfo, "{0}", obj);
    if (alwaysQuote || field.Contains(delimiter))
    {
        field = field. Replace(delimiter, escapedDelimiter);
        return delimiter + field + delimiter;
    }
    return field;
}

At this level, when we are only dealing with 1 individual value at a time, simple string replace is likely to be the same or more efficient than a Regular Expression solution.


This code was de-constructed from CsvHelper.WriteField

Chris Schaller
  • 13,704
  • 3
  • 43
  • 81
  • About the side note, my bad didn't notice it. Yes, it should be string. about the question how I call it, I am listening to the event `OnCopying` of the gridview - https://docs.telerik.com/devtools/wpf/controls/radgridview/clipboard/copying - and set `args.Value = ToCsv(args.Value);` the copying event fires for each cell - notice: I know I could suppress the gridviews copying mechanism and write my own and as you have shown write everything at once – Rand Random Oct 06 '22 at 13:09
  • Another note, normally we wouldn't bother parsing the fields at all, when you save that data to CSV, at that point the CSV writer does it for us, and it is very efficient at doing that, so there is little value in delimiting the grid cells at all. – Chris Schaller Oct 06 '22 at 13:39
  • If I am not mistaken this line `return delimiter + field + delimiter;` should be `return quote + field + quote;` and the method is missing a `quote` argument? – Rand Random Oct 06 '22 at 13:45
  • `delimiter` _is_ `quote` they are the same concept, in CSV the _delimiter_ IS a quote. I tried to make this code match your code. Ironically to avoid this type of discussion ;) – Chris Schaller Oct 06 '22 at 14:34
1

I know you mention CsvHelper, but here is a method I put together to build a csv "cell" using StringBuilder

    /// <summary>
    /// StringBuilder Extension method - Escape cells, as they may potentially contain reserved characters
    /// </summary>
    /// <param name="sb">StringBuilder that is assembling the csv string</param>
    /// <param name="val">Value string to be persisted to the cell</param>
    /// <returns>StringBuilder, with the escaped data cell appended</returns>
    internal static StringBuilder EscapeCell(this StringBuilder sb, string val)
    {
        if (string.IsNullOrWhiteSpace(val)) return sb;

        //no need to escape if does not contain , " \r or \n
        if (!val.Contains(",") && !val.Contains("\"") && !val.Contains("\r") && !val.Contains("\n"))
        {
            sb.Append(val);
            return sb;
        }

        //surround in quotes + any internal quotes need to be doubled -> ex.,"this is an ""example"" of an escaped cell",
        string escaped = val[0] == '\"'
            ? val.Substring(1, val.Length - 2).Replace("\"", "\"\"")
            : val.Replace("\"", "\"\"");
        
        sb.Append('\"').Append(escaped).Append('\"');
        return sb;
    }

The idea is that you want to escape the entire cell if it has a special char that may break CSV structure, and any internal " needs to be normalized as ""

Using StringBuilder throughout means building the CSV string is as fast as can be. Then write the CSV string to file as needed

Narish
  • 607
  • 4
  • 18