0

I'm working on a database import/export process in VB.NET which writes data from a MySQL (5.5) database to a plain text file. The application reads the data to a DataTable, then goes through the rows/columns to actually write the data to the OutputFile (System.IO.StreamWriter object). The encoding on the tables in this database is Latin1. There is a MediumBlob field in one of the tables I've been using for testing which contains image files stored as a byte array.

In my attempts to validate the output from my application, I've exported the data directly from the database using the MySQL Workbench, then compared that with the results I get when I write the same data from my application. In the direct export from MySQL Workbench, I see some of these bytes are exported with the backslash. When I read the data through my application, however, this escape character does not appear. Viewed through Notepad++, it clearly shows some distinct differences between the two output results (see screenshot).

Notepad++ comparison of output results

Obviously, while apparently very similar, the two are not completely identical. My application is not including the backslashes for escaped characters, and some characters such as NULL are coming out differently altogether. My code for writing this field to the file is:

OutputFile.Write("'" & System.Text.Encoding.GetEncoding(28591).GetString(CType(COPYRow(ColumnIndex), Byte())) & "'")

There doesn't appear to be an overload for the GetString method that allows me to specify an escape character, so I'm wondering if there's another way that, using this method, I can ensure the characters are correctly encoded, including escape characters.

I'm "assuming" that this method should also work in general when I start working with my PostgreSQL database, but with possibly a different encoding. I'm trying to build things as "generic" as possible, but I'll have to worry about specifying encodings at run-time instead of hard-coding them later.


EDIT

I just ran across another SO question, which might point me in the right direction: Convert a Unicode string to an escaped ASCII string. Obviously, it might take a bit more work to get it right, but this looks like the closest thing to what I'm trying to accomplish.

Community
  • 1
  • 1
G_Hosa_Phat
  • 976
  • 2
  • 18
  • 38
  • It's not clear, to me, what you're trying to do. Are you trying to conver an Image bytes to a US-ASCII representation? Or `iso-8859-1` encoded strings to US-ASCII, preserving the non-convertible CodePoints to the escaped Unicode representation as ASCII text? What's the use of these? – Jimi Nov 27 '18 at 17:21
  • MySQL Workbench is showing you a "debugging" output (probably of its own invention). Why do you want your text file to be anything but a straightforward text file with a specific character encoding? (Or, why not a clean JSON file [which would use UTF-8]?). – Tom Blodget Nov 27 '18 at 17:22
  • For binary data, ideally, it wouldn't come out of your data access layer as a .NET String. A String is a text data type. If it must be a String can you convert the data on the server side to a Base64 string? – Tom Blodget Nov 27 '18 at 17:26
  • Perhaps I'm over-thinking things (or thinking about it incorrectly), but the intent is to be able to export the data in a format that is usable for importing back into the database at potentially some point in the future. To that end, I am comparing the output produced by my application to the output produced by MySQL Workbench and trying to reproduce the latter as closely as possible. Additionally, I ***may*** need to be able to transfer this data from one RDBMS to another (MySQL -> PostgreSQL or vice-versa), so I want to ensure the data is as "clean" as possible. – G_Hosa_Phat Nov 27 '18 at 17:31
  • 1
    You are over-thinking it. For Text fields, [read the Collation value](https://dev.mysql.com/doc/refman/8.0/en/charset-mysql.html) and reproduce the Encoding when saving the Text to a file. Or, convert everything to UTF-8, if possible. Byte Arrays representing Bitmaps must remain as they are. You might, if really required, [convert the array to a Base64 representation](https://learn.microsoft.com/en-us/dotnet/api/system.convert.tobase64string). – Jimi Nov 27 '18 at 18:14
  • Might that be Arabic (`د؈`)? And your charset is `macroman`? – Rick James Nov 28 '18 at 05:24

0 Answers0