0

I have a COM server app (App_A) that only supports native data types. I send the parameters over the COM server to a C# app (App_B) that then sends on the data as a web request.

My problem is that the String data read by App_A is Unicode, but App_A does not support non-UTF-8 encoding for its COM String values, so the data can be sent as a byte array or char array.

If I use the byte array, the generic App_B is now broken as I now have to handle this single data update differently to all the others (and I fear there will be more), so I would like to keep the App_B handling of values generic (obj.ToString).

If I hard code an App_B C# String as a literal, e.g. "\u5f90", the String contains a Unicode character and the HttpUtility.UrlEncode call in App_B works exactly as expected. If the String is passed in as a value (obj.ToString() = "\u5f90") the '\' is escaped and the UrlEncode does not UTF-8-encode a Unicode character as the '\u' escape sequence is lost.

I guess my question comes down to:

So far I have manipulated the byte array in App_A to replace the Unicode values (xxxx) with '\uxxxx': - is there any way I can use a String variable as a format string in the C# App_B?

Alternatively, if I'm going about this the wrong way, what would anyone suggest?

Please bear in mind that I have approx 300 data value updates that all use a generic o.ToString for part of the UrlEncode argument and I would like to keep this if possible.

Nalaka526
  • 11,278
  • 21
  • 82
  • 116
  • Do i understand it correctly that your COM server returns UTF-8 strings instead of Unicode strings? That would be rather unusual, as COM "naturally" works with Unicode... Why not let the COM server provide Unicode strings instead of UTF-8? –  Sep 09 '15 at 09:01
  • App_A (COM publisher) only supports char as an 8-bit data type. This is why the Unicode string is being sent as either a 'Data' type (byte[]) or 'String' type (char[]). It is stripping ALL char values > 127 (ASCII). – Jason Etherton Sep 09 '15 at 09:17
  • If you cannot make the COM server to emit Unicode strings, than i would rather keep it emitting UTF8 strings/byte arrays and doing the UTF8 conversion client-side (App_B). Your idea of inserting Unicode escape sequences into the string will not save you the conversion process on the client -- it just exchanges one conversion method (UTF-8 byte array->string) with another (escaped string->decoded string). In comparison, your idea looks worse to me: it requires specific changes in both the server and the client(s) - whereas with a proper UTF8 conversion on the client, the server can remain as is –  Sep 09 '15 at 09:34
  • On the other hand, since you are apparently able to modify the code for the COM server, what would stop you from expanding it in an effort to make it able to emit Unicode strings? I do not really mean this a question or invitation to a discussion. Rather as a comment towards rethinking about the COM server (as much or as little as your situation would allow you, of course...) –  Sep 09 '15 at 09:36
  • App_A is bought off-the-shelf and the COM server is pretty limited, but it is using hardware to read Unicode strings from a network. What we have is the ability to change the contents of the values passed through App_A's COM server, but not the types of data it supports. – Jason Etherton Sep 09 '15 at 09:51
  • All other values are ints or ASCII strings so the UrlEncode works perfectly with the obj.ToString. – Jason Etherton Sep 09 '15 at 09:53
  • With your current approach, the problem you have is that '\', 'u', '5', 'f', '9', '0' are literal characters. This character sequence would need to be decoded and replaced with the respective Unicode character. UrlEncode/UrlDecode will not help you, because `\u` escape sequences are not part of the encoding scheme used by URLs. –  Sep 09 '15 at 10:02
  • Conversion of UTF-8 byte arrays is [very simple](https://msdn.microsoft.com/en-us/library/9d876whe%28v=vs.110%29.aspx). I am not sure what you want to use HttpUtility.UrlEncode() (or HttpUtility.UrlDecode()) for... .NET has no built-in function to decode strings containing '\','u',... escape character sequences as far as i know. You could [roll your own](http://stackoverflow.com/questions/183907/how-do-convert-unicode-escape-sequences-to-unicode-characters-in-a-net-string), though... –  Sep 09 '15 at 10:07

1 Answers1

0

Is it an option for you to support different encodings in your deserialization of the byte arrays in App_B? I'd suggest modifying App_A so that each sent string has an additional first byte which defines the encoding, which then has to be respected by App_B. That way it doesn't matter which encoding you use, as long as both apps support it.

I'd strongly suggest not modifying the strings as you've described by preceeding it with \u, that's just gonna be a mess of code later on which needs to be documented well and needs to be understood again if you come back to it later etc.

Daniel Schmid
  • 362
  • 1
  • 5
  • 20
  • Hi, I understand but this is the only value currently that requires special handling and it is easier for us to change the value of one COM parameter than implement what you suggest for approx. 300 other parameters. – Jason Etherton Sep 09 '15 at 09:57
  • We only pass int or string types through the COM server, all of which are UrlEncoded by their .ToString() return value. If I pass the literal string in with the '\u', that too works, and given the constraint of the App_A COM server, this is the simplest (I think). – Jason Etherton Sep 09 '15 at 09:58