I have an input file that is UTF-8 encoded. I need to use some of its content and create an ISO-8859-15 encoded CSV file from it.
The problem is that UTF-8 seems to have several characters for double quotes that are automatically replaced to the character "
(= Quotation Mark U+0022) when writing the CSV file to the disc.
The ones we found are:
- Left Double Quotation Mark U+201C
- Right Double Quotation Mark U+201D
- Double Low-9 Quotation Mark U+201E
- Modifier Letter Double Prime U+02BA
- Combining Double Vertical Line Above U+030E
- Fullwidth Quotation Mark U+FF02
The conversion happens automatically when I write to the CSV file like this:
using (StreamWriter sw = new StreamWriter(workDir + "/files/vehicles.csv", append: false, encoding: Encoding.GetEncoding("ISO-8859-15")))
{
foreach (ad vehicle in vehicles)
{
sw.WriteLine(convertVehicleToCsv(vehicle));
}
}
The method convertVehicleToCsv
escapes double quotes and other special characters of the data, but does not escape the special UTF-8 double quote characters. Now that the double quotes are replaced automatically the CSV is no longer RFC-4180 conform and therefore corrupt. Reading it using our CSV library fails.
So the question is:
What other UTF-8 characters are automatically replaced/converted to the "normal" "
character when converting to ISO-8859-15? Is this documented somewhere? Or am I doing something wrong here?