1

I am importing a csv file, while reading it, there are special character like '�' appearing in a string which is read, how to avoid the these unicode characters.

I am using TextFieldParser for parsing the data, but while reading, space between two string in a sentence is replaced with the character '�'. Tried to do a contains search of a string and replace the character, but special character might be something different later.

Encoding DefaultEncoding = Encoding.UTF8;

public IList<string[]> ReadCsvData()
{
    using (var reader = ReadBase64File())
    {
        return CsvParser.ReadCsvData(reader);
    }
}

TextReader ReadBase64File()
{
    var bytes = Convert.FromBase64String(base64File);
    return new StreamReader(new MemoryStream(bytes), DefaultEncoding, true);
}    

public static IList<string[]> ReadCsvData(TextReader reader)
{
    IList<string[]> csvData = new List<string[]>();
    using (Microsoft.VisualBasic.FileIO.TextFieldParser parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader))
    {
        parser.SetDelimiters(",");
        parser.TrimWhiteSpace = true;

        try
        {
            while (!parser.EndOfData)
            {
                csvData.Add(parser.ReadFields());
            }
        }
        catch (Microsoft.VisualBasic.FileIO.MalformedLineException ex)
        {
            throw new FormatException($"Invalid format found when importing the CSV data (line {parser.ErrorLineNumber}).", ex);
        }
    }

    return csvData;
}
stuartd
  • 70,509
  • 14
  • 132
  • 163
Hanumesh
  • 11
  • 2
  • 1
    Can you provide a short example of text which reproduces the issue? – stuartd Sep 11 '19 at 14:48
  • sample text looks like: GOOD�MORNING�EVERYONE,�HAVE,�GOOD,�DAY – Hanumesh Sep 11 '19 at 15:24
  • 1
    Looks more like an issue with how this CSV is *generated*. I would suggest you put this string in an editor like PSPad or similar that allows you to view the hex codes. Tell us the hex of that character. Also provide more info on how this string is generated. – LocEngineer Sep 11 '19 at 15:35
  • Which character encoding was used to write the file? You have to use that to read it. Please give sample bytes (in hexadecimal) from the file, first 10 for text that you gave earlier. – Tom Blodget Sep 13 '19 at 10:07

1 Answers1

0

Just use Encoding.GetEncoding(1252) as the second parameter of the TextFieldParser constructor, i.e. replace:

new Microsoft.VisualBasic.FileIO.TextFieldParser(reader)

with:

new Microsoft.VisualBasic.FileIO.TextFieldParser(reader, Encoding.GetEncoding(1252))