Encoding of character in textfield doesnt show correctly in export using ASP.NET MVC

Question

Someone put in this apostrophe when typing a height, probably a copy and paste from say a Word document. It displays fine on the web but when doing an export the text reads with a funny character. Is there something I can do either on the input save or export to fix this problem without causing issues through out the site? I am using ASP.NET MVC and .NET 4.8.

Text Input

6’2

CSV Export

6â€™2

Try https://stackoverflow.com/a/55247163/177416 – Alex Apr 01 '21 at 20:38 — Alex, Apr 01 '21 at 20:38

score 0 · Answer 1 · answered Apr 01 '21 at 17:41

depending on how you save the file, the easiest might be to ensure it's written out as Unicode. Assuming you can opt for StreamWriter, you could do something like so:

void Main()
{
    var dt = new DataTable(); 
    dt.Columns.Add("C1");
    dt.Columns.Add("C2");
    
    var r = dt.NewRow();
    r[0] = "test";
    r[1] = "6’2";
    dt.Rows.Add(r);
    
    var streamWriter = new StreamWriter("D:\\test.csv"); // defaults to Unicode
    foreach(DataRow row in dt.Rows)
        for (int i = 0; i < dt.Columns.Count; i++)
        {
            streamWriter.Write(row[i].ToString());
            streamWriter.Write(",");
        }
    streamWriter.Flush();
}

score 0 · Answer 2 · answered Apr 02 '21 at 09:57

Your text seems to written out as Unicode (specifically UTF8) just fine.

Remember that a CSV file is nothing but bytes, so reading it correctly requires writer and reader to agree on encoding.

If you put a UTF8-encoded character (U+2019) the reader must also understand the same encoding, and agree to use it. If not, you will get "mojibake". This doesn't mean the content is wrong (or changed). It's just encoded in a format your reader does not expect.

Obviously you are using a western character set. If your computer was set to, say, Russian, your output would be different.

Can you reveal what you use to read the file created ? Maybe you should put BOM marks as suggested in a comment above ? This tells readers that do understand UTF8 that this is in fact a UTF8 file. Note: Readers that don't understand BOM marks may display these as unreadable characters.

If you have to use a legacy reader for this file, one solution may be to parse the file with iconv or similar tool.

    ’
    https://unicode-table.com/en/2019/
    
    UTF8 encoded:
        E2 80 99        hex
        226 128 153     decimal
    
    Read as Windows 1252
        6â€™2
    
      â = E2
      € = 80
      ™ = 99

score 0 · Answer 3 · edited Apr 07 '21 at 17:31

0

try this

string utf8_String = "dayâ€™s";
byte[] bytes = Encoding.Default.GetBytes(utf8_String);
utf8_String = Encoding.UTF8.GetString(bytes);

edited Apr 07 '21 at 17:31

General Grievance

4,555
31
31
45

answered Apr 07 '21 at 08:51

Supholchai Pothong

1

Encoding of character in textfield doesnt show correctly in export using ASP.NET MVC

3 Answers3