0

For encodings issues, I have some manipulations before storing data into the database.

To do so I'm making use of the DUMP and CHR sql methods to safely keep all ma data intact when saving/retrieving them.

Here are my steps when I save information into the database :

  • Getting the strings, mostly cyrillic in my case, things like АЛЕКСЕЕВИЧ
  • I convert them in c# into Windows-1252 encoding, the strings look like this ÀËÅÊÑÅÅÂÈ×

At this point, in most cases I can register the data as is using a DBParameter. But for some reasons there are some troublesome characters that looses information (they are all transformed into the same "?") when saving them from c# to the database. That's the case with the "×" just above.

  • So when this happens, I convert them into bytes and into SQL format : CHR(192) || CHR(203) || CHR(197) || CHR(202) || CHR(209) || CHR(197) || CHR(197) || CHR(194) || CHR(200) || CHR(215)
  • Then using regex I'm replacing the part where I should usually use a DBParameter with the sql code I just generated. It should still be safe to injections.

There I have a working code, only for cases where I have the SQL request within my code. The complicated part is when I have a stored procedure I can't modify. I can't create a parameter that will be interpreted since that's the exact purpose of DBParameter, to block all sql injection. I'm blank with ideas at this point.

Qrom
  • 487
  • 5
  • 20
  • 1
    This sounds like an overly complicated solution for storing unicode characters in a database. I'm so confused. – Dan Wilson Apr 20 '18 at 13:46
  • 1
    it sounds like you simply have the column encoding defined incorrectly; I'm not an Oracle person, but if this was SQL Server, I'd guess "oh, you've created the column as `varchar(len)` in some codepage, rather than `nvarchar(len)`" – Marc Gravell Apr 20 '18 at 13:47
  • Agreed. Your application, database connection (connection string), and table must all be set up to handle unicode. Then just insert your cyrillic. You are almost guaranteed to always have edge cases going the way you are going. – JNevill Apr 20 '18 at 13:49
  • @Dan Wilson Yes indeed, I'm working for an old company that has a lot of old databases with old encodings. There are still old application from the 80s running and using those databases, we can't alter the existing so the new applications need to adapt. – Qrom Apr 20 '18 at 13:49
  • 1
    That's horseshit. "I need to store unicode in the table"... "Sorry, kid, we haven't updated this table since the 80s". Sounds like a good place to not work. – JNevill Apr 20 '18 at 13:51
  • I agree with @JNevill. It's your choice whether you want to continue finding solutions to this "unicode" problem, push back on your coworkers to change, or find another employer. – Dan Wilson Apr 20 '18 at 13:57
  • Honestly, I think my push back here would be "I am receiving unicode input. If the table isn't updated to allow us to store this, then the data will be lost." and stop trying to build a workaround. If you are absolutely stuck having to build this as a workaround, consider converting the unicode to base64 and store the base64 string. It will look like garbage in the database and will have to be converted back costing precious CPU cycles and slower data retreival, but it's something. Unfortunately, I have no idea how this relates to your stored procedure issue. That part of the Q is unclear. – JNevill Apr 20 '18 at 14:02
  • [Here is a question where they convert UTF-8 encoded unicode to Base64](https://stackoverflow.com/questions/13938137/convert-utf-8-to-base64-string) – JNevill Apr 20 '18 at 14:03
  • If this is being stored next to ansii, non-base64-encoded-data, then you may want to tuck a character in at the front of the base64 that is ascii, but isn't used in the data like `§` or `¦` so you know what to decode and what to leave as-is. *shudder* – JNevill Apr 20 '18 at 14:07
  • Thanks for the advices guys, I will try to make them adopt the base64 thing if they really can't -or won't- change the database's encoding – Qrom Apr 20 '18 at 14:09
  • 2
    unless you can't efford downtime for index rebuild, I see 0 issue with changing varchar to nvarchar. This has 0 impact beside that tiny extra storage – Steve Apr 20 '18 at 14:15

0 Answers0