-1

I am working on a report to move data from table to reports in CSV.

Below is my SQL Server database that has column values as below, in non-english characters such as ü, ç ,é. And i have provided CSV output generated. The code which I am using also I have pasted below. 1st image is database data. 2nd image is Excel report data.

Database screeenshot of non-english character

Below is the CSV output

I have below code written.

public static byte[] GetMemoryStream<T>(List<T> data) where T : class
{
    Logger.Debug("Inside GetMemoryStream(), step 7");
    using (var mem = new MemoryStream())
    using (var writer = new StreamWriter(mem))
    using (var csvWriter = new CsvWriter(writer, System.Threading.Thread.CurrentThread.CurrentCulture)) // System.Threading.Thread.CurrentThread.CurrentCulture
    {                
        var options = new TypeConverterOptions { Formats = new[] { "yyyy/MM/dd HH:mm:ss" } };
        csvWriter.Configuration.TypeConverterOptionsCache.AddOptions<DateTime>(options);
        csvWriter.Configuration.Delimiter = ",";

        csvWriter.WriteHeader<T>();
        csvWriter.NextRecord();
        csvWriter.WriteRecords(data);

        writer.Flush();
        var result = Encoding.Unicode.GetString(mem.ToArray());
        //Console.WriteLine(result);
        return Encoding.Unicode.GetBytes(result);
        // return mem.ToArray();
    }
Dale K
  • 25,246
  • 15
  • 42
  • 71
Abhijeet Sinha
  • 161
  • 2
  • 13
  • 1
    You need to tell us the datatypes of the columns, and the collation of your database (or the columns if individually collated). – Dale K Aug 18 '20 at 06:18
  • 2
    Your code seems to maybe assume that `StreamWriter` will be writing using `Encoding.Unicode`. It uses `Encoding.UTF8` by default. The bigger question though is: why when `mem.ToArray()` returns an array of bytes, are you converting it to a string (using the wrong encoding) and back to bytes? Are you trying to convert the encoding system used? Why not just pass `Encoding.Unicode` to the `StreamWriter` constructor? – ProgrammingLlama Aug 18 '20 at 06:18
  • 1
    @DaleK That's a red herring, I think. – ProgrammingLlama Aug 18 '20 at 06:18

2 Answers2

2

I'm not really sure what you're trying to do, but I think I can write an answer that covers all the bases.

First I'll explain what your current code is doing:

using (var mem = new MemoryStream())
using (var writer = new StreamWriter(mem))

var result = Encoding.Unicode.GetString(mem.ToArray());
return Encoding.Unicode.GetBytes(result);

The StreamWriter constructor you're using will set the writer up to encode content with UTF8 encoding:

Initializes a new instance of the StreamWriter class for the specified stream by using UTF-8 encoding and the default buffer size.

mem.ToArray() will return the contents of the memory stream as a byte array (byte[]).

You then take that byte array and try to convert it to a string using Unicode encoding. See What is the difference between UTF-8 and Unicode? Clearly, being different encoding systems, you can't expect the resulting string to be true to the original data you wrote.

Finally, you convert that string into a Unicode-encoded byte[]. Clearly the data here will still be wrong because we started with a bad copy.


Looking at your code I'm assuming one of two things has happened:

  1. You have mistakenly assumed that Unicode == UTF8 (it doesn't).
  2. You're unaware that you can control the encoding used by StreamWriter and you're trying to convert the UTF8-encoded data it generates into a Unicode-encoded file.

Assuming you want UTF8-encoded text, the fix for 1 would be to simply return mem.ToArray() without the conversion via result:

using (var mem = new MemoryStream())
using (var writer = new StreamWriter(mem))

return mem.ToArray();

The fix for 2 would be to simply initialize StreamWriter with the correct encoding (assuming you want Unicode) and then return mem.ToArray():

using (var mem = new MemoryStream())
using (var writer = new StreamWriter(mem, Encoding.Unicode))

return mem.ToArray();

If you want to do a conversion from UTF8 to Unicode, you need to decode the string using UTF8 and then encode it using Unicode:

using (var mem = new MemoryStream())
using (var writer = new StreamWriter(mem, Encoding.UTF8)) // UTF8 is default, but for clarity I've included it explicitly here.

var result = Encoding.UTF8.GetString(mem.ToArray());
return Encoding.Unicode.GetBytes(result);

This will decode the string from the byte[] using UTF8 encoding, and then re-encode it to a byte[] using Unicode encoding.

ProgrammingLlama
  • 36,677
  • 7
  • 67
  • 86
  • Below 3 lines of code as suggested by you have halepd me resolve my issue using (var writer = new StreamWriter(mem, Encoding.UTF8)) var result = Encoding.Unicode.GetString(mem.ToArray()); return Encoding.Unicode.GetBytes(result); – Abhijeet Sinha Aug 21 '20 at 10:47
-1

Below is the piece of code that worked for me

(var writer = new StreamWriter(mem, Encoding.UTF8)) 
var result = Encoding.Unicode.GetString(mem.ToArray()); 
return Encoding.Unicode.GetBytes(result);

So now i am not getting any ?'s in excel sheet or other unknown characters which i was getting previously, below is comparison image of my previous code versus my current code. and current code is great

enter image description here

Hans Kesting
  • 38,117
  • 9
  • 79
  • 111
Abhijeet Sinha
  • 161
  • 2
  • 13