0

Is there a way to remove non-ascii characters with configuration in CsvHelper instead of writing the conversion in application code?

I saved an Excel to CSV and found some values like AbsMarketValue������������� and I would like to get rid of the non-ASCII characters.

csv.Configuration.Encoding = Encoding.ASCII did not work.

With reference to How can you strip non-ASCII characters from a string? (in C#)

string s = "søme string";
s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);

The above approach works for me but I want to avoid this since this requires me to add this type of code in application for any text field.

I tried to do this in the conversion map but that did not work.

Anand
  • 1,387
  • 2
  • 26
  • 48

1 Answers1

1

Using a type converter, you could have all string properties only output ASCII characters.

void Main()
{
    using (var reader = new StringReader("Id,Name\n1,AbsMarketValue�������������"))
    using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
    {
        csv.Context.TypeConverterCache.AddConverter<string>(new AsciiOnlyConverter());
        
        var records = csv.GetRecords<Foo>();
    }
}

public class Foo
{
    public int Id { get; set; }
    public string Name { get; set; }
}


public class AsciiOnlyConverter : StringConverter
{
    public override object ConvertFromString(string text, IReaderRow row, MemberMapData memberMapData)
    {
        var ascii = Regex.Replace(text, @"[^\u0000-\u007F]+", string.Empty);
        
        return base.ConvertFromString(ascii, row, memberMapData);
    }
}
David Specht
  • 7,784
  • 1
  • 22
  • 30