0

Let's try explain my problem. I have to receive a message from a server (programmed in delphi) and do some things with that message in the client side (which is the side I programm, in c++).

  1. Let's say that the message is: "Hello €" that means that I have to work with std::wstring as (euro sign) needs 2 bytes instead of 1 byte, so knowing that I have made all my work with wstrings and if I set the message it works fine. Now, I have to receive the real one from the server, and here comes the problem.

  2. The person on the server side is sending that message as a string. He uses a EncodeString() function in delphi and he says that he is not gonna change it. So my question is: If I Decode that string into a string in c++, and then I convert it into a wstring, will it work? Or will I have problems and have other message on my string var instead of "Hello €".

  3. If yes, if I can receive that string with no problem, then I have another problem. The function that I have to use to decode the string is void DecodeString(char *buffer, int length);

so normally if you receive a text, you do something like:

char Text[255];
DescodeString(Text, length); // length is a number  decoded before

So... can I decode it with no problem and have in Text the "Hello €" message? with that I'll just need to convert it and get the wstring.

Thank you

EDIT:

I'll add another example. If i know that the server is going to send me always a text of length 30 max, in the server they do something like:

EncodeByte(lengthText);
EncodeString(text)

and in the client you do:

 int length;
 char myText[30];

 DecodeByte(length);
 DecodeString(myText,length);

and then, you can work with myText as a string lately.

Hope that helps a little more. I'm sorry for not having more information but I'm new in that work and I don't know much more about the server.

EDIT 2

Trying to summarize... The thing is that I have to receive a message and do something with it, with the tool I said I have to decode it. So as de DecodeString() needs a char and I need a wstring, I just need a way to get the data received by the server, decode it with decodeString() and get it into a wstring, but I don't really know if its possible, and if it is, I'm not sure about how to do it and what type of vars use to get it

EDIT 3

Finally! I know what code pages are using. Seems that the client uses the ANSI ones and that the server doesn't, so.. I'll have to tell to the person who does that part to change it to the ANSI ones. Thanks everybody for helping me with my big big ignorance about the existence of code pages.

Megasa3
  • 766
  • 10
  • 25
  • If you want to use single-byte chars (e.g. in C and C++), you can use those and simply encode the strings in UTF-8, or use Ansi with a codepage that has a Euro sign. I would vote for UTF-8. – Rudy Velthuis Sep 08 '15 at 10:17
  • @RudyVelthuis If I understand you well, you mean to change my DecodeString() function. I forgot to say that it is supposed that I have to work with it without changing that function (my boss' decision :( ) – Megasa3 Sep 08 '15 at 10:19
  • You've got to work out how the text is encoded. Do you know? – David Heffernan Sep 08 '15 at 10:19
  • @DavidHeffernan I know that they are sending it as they used to with any other text, that's why I put that little code about how is it supposed to be decoded. – Megasa3 Sep 08 '15 at 10:20
  • No, you are missing the vital information. How is the text encoded? If you don't know what a text encoding is, you need to do some research before you can make progress. – David Heffernan Sep 08 '15 at 10:22
  • Can you just try to receive a message from server as a byte array (i.e. no decoding at all) and see how the euro sign is encoded? You can also post this byte array here so that we can advice you. – Petr Sep 08 '15 at 10:26
  • Yes, that would be a start. You'll probably discover that the Delphi code uses the system locale and so long as you know the ANSI code page of the server, you'll be fine. Whoever is taking the decisions in this project needs to have their head examined! – David Heffernan Sep 08 '15 at 10:27
  • @DavidHeffernan I know what is encode and decode, I have just added an edit to my post. The thing is that I'm new in the job and they just told me I have to use the DecodeString which needs the char* buff and the length and the workmate who programmed the serverd side said his part is ok and nothing more – Megasa3 Sep 08 '15 at 10:27
  • @Petr I don't know how to see it but i'll try, give me some time to see.... – Megasa3 Sep 08 '15 at 10:28
  • 4
    Take it from me, you need to learn about text encoding. As you said, you are new, and you don't know this. Well, I know what you need to know. You need to learn about and understand text encodings. That's your next task. – David Heffernan Sep 08 '15 at 10:28
  • *Why* do you want to convert Unicode data to an ASCII codepage? It's just asking for trouble - unless your application uses a specific, hard-coded codepage for input/output, it's almost guaranteed that the data will be mangled at some point. Besides, which codepages would you use? – Panagiotis Kanavos Sep 08 '15 at 10:29
  • 1
    Is not your `char Text[255];` exactly the byte message received? Just print each element as `int`, not `char`. – Petr Sep 08 '15 at 10:30
  • 3
    PS - The euro sign was [added in 1998](https://en.wikipedia.org/wiki/Code_page_858). Older codepages like 437 and 850 do *not* have this character. Eg 858 was to replace 850 that replaced ı with €. You need to make sure the appropriate codepage is used. – Panagiotis Kanavos Sep 08 '15 at 10:34
  • see? the codepage was something I didn't know about. I've just learnt something new! ty! Trying to summarize... The thing is that I have to receive a message and do something with it, with the tool I said I have to decode it. So as de DecodeString() needs a char and I need a wstring, I just need a way to get the data receiver by the server, decode it with decodeString() and get it into a wstring – Megasa3 Sep 08 '15 at 10:39
  • @Petr no, `char Text[255];` is a var declared in a .h so in the .cpp I can use the decodeString function. it is not the data received still – Megasa3 Sep 08 '15 at 10:41
  • You aren't listening. You cannot decode the binary to text without knowledge of the code page. Ask whoever is sending the data to tell you what code page is involved. If they won't tell you, consider looking for a better job. – David Heffernan Sep 08 '15 at 10:45
  • @DavidHeffernan i was trying to get that answers. I've asked them and they don't know. They just say that if the code is working, then its ok and I just have to use that decodestring(), get the char array and work with it.... u.u I hate that logic about: "if it works, don't ask, don't try to learn or understand, just make it work" ... – Megasa3 Sep 08 '15 at 10:51
  • OK, I've said all that I'm going to say here. It's over to you now. – David Heffernan Sep 08 '15 at 10:52
  • @DavidHeffernan ok, ty for all, i'll try to understand all that better. Just a little question? are you upset? :S Becauses I didn't want to, believe that I'm trying to do my best... – Megasa3 Sep 08 '15 at 10:57
  • I'm not at all upset. I sympathise with you. But time is limited. I've said what I feel needs to be said, and that's fine. Good luck. Make sure your CV is in good shape in case you spot a job opportunity with more sane developers! – David Heffernan Sep 08 '15 at 10:58
  • ok, thanks @DavidHeffernan! I know i know, hope i find another one haha. Now I'm reading about code pages to see if I can just supose that they are using the same or not, etc... – Megasa3 Sep 08 '15 at 11:03
  • @PanagiotisKanavos I'm trying to understand what code pages are exactly but... do you need just a number or the region? I mean... I'm working with spanish letters (ñáéíïÜ ...) does that help or do I need to figure out the exact number? – Megasa3 Sep 08 '15 at 11:10
  • You should be using Unicode in order to handle all characters - not just the euro symbol - correctly. Your C++ app should use wstring, and the Delphi app should use `WideString` (if it's made with an ancient version) or `string` / `UnicodeString` if made in D2009 or later. However, it's possible the Delphi app is sending UTF8 (a Unicode encoding) or an ANSI, code-page encoded string, which as others said above means you need to know the codepage in order to decode it to your Unicode wstring correctly. Short answer: ask the devs again, accept only a thorough answer. And upgrade their Delphi ;) – David Sep 08 '15 at 11:12
  • 1
    Also, start with this - a good introduction: http://www.joelonsoftware.com/articles/Unicode.html – David Sep 08 '15 at 11:12
  • As I know all current OSes have a function to convert charset, why don't just use it? And € is not exactly 2 bytes. It depends on the encoding and the width of each character. For example if `sizeof(wchar_t)` is 4 bytes (most probably on Linux/Unix) then € takes 4 bytes excluding header, etc. If it's stored as UTF-8, it takes 3 bytes `E2 82 AC`. You can convert it to 1 byte if there's a single byte charset that contains that character, but it's unlikely easy and would make compatibility with other applications painful, as all OSes are using Unicode already – phuclv Sep 08 '15 at 11:13
  • 4
    read [Joel on Software's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html) and [Unicode, UTF-8 and character encodings: What every developer should know](http://www.teknically-speaking.com/2014/02/unicode-utf-8-and-character-encodings_23.html) – phuclv Sep 08 '15 at 11:14
  • @DavidM, suppose that is utf8 (while they don't tell me more info), then what should I do now? If I do something like `std:wstring str = L"Hello €";` then I parse the euro looking letter by letter and comparing with `L'\x20AC` and it works (dunno if that may help with something) I got the 20Ac from here http://www.fileformat.info/info/unicode/char/20ac/index.htm – Megasa3 Sep 08 '15 at 11:40
  • You need to decode the utf8 encoded binary to text. That's been covered many many times here. For instance http://stackoverflow.com/questions/14601413/cross-platform-way-to-convert-utf8-to-stdwstrin But don't you have any experienced colleagues that can help you? Don't you have libraries that your code works with. Are you really the only programmer on your team? – David Heffernan Sep 08 '15 at 12:17
  • @DavidHeffernan no, i'm not the only one but the ones who know that info are not here today. Seems that I'll have to wait until tomorrow to ask them again and get a correct answer. – Megasa3 Sep 08 '15 at 13:04
  • I suspect that if you proceed, you'll learn something, but that you might end up reimplementing (possibly incorrectly) something that already exists in your code base. – David Heffernan Sep 08 '15 at 13:11
  • 1
    Which ansi codepage? There are many. – David Heffernan Sep 08 '15 at 16:37
  • @DavidHeffernan I'm trying to know which one... he told me if it could be the 0409 in hex ... english from US (which I hope it has things like á é ï ñ ...) – Megasa3 Sep 09 '15 at 07:14
  • I'd expect a codepage like 1252 which is the English ANSI codepage. It's 2015. Nobody should be using ANSI. We did that because we had no choice in Windows 98. – David Heffernan Sep 09 '15 at 07:18
  • @DavidHeffernan I still don't know which code page is using the server (I'll know it tomorrow or monday) BUT as it is now, if I receive the message " Hello €áñ" what I get is: "Hello \200 á " ... do you know which code page may it be? I've look into so much code pages if the pos 200 is € but nothing found... (ofc I don't know if that \200 means that € is the number 200 of the codepage but I supposed it) – Megasa3 Sep 09 '15 at 07:34
  • 1
    We could make a guess if you'd stop trying to decode the text and just showed us the binary. – David Heffernan Sep 09 '15 at 07:49
  • Please show the hex values of the entire string. Also, if it is UTF8 - great! Decode it into a wstring and you're set. – David Sep 10 '15 at 14:42

1 Answers1

0

Since you're using wstring, I guess that you are on Windows (wstring isn't popular on *nix).

If so, you need the Delphi app to send you UTF-16, which you can use in the wstring constructor. Example:

char* input = "\x0ac\x020"; // UTF-16 encoding for euro sign
wchar_t* input2 = reinterpret_cast<wchar_t*>(input);
wstring ws(input2);

If you're Linux/Mac, etc, you need to receive UTF-32.

This method is far from perfect though. There can be pitfalls and edge cases for unicodes beyond 0xffff (chinese, etc). Supporting that probably requires a PhD.

  • mmm I din't know that wstring were for windows... .I'm working on a unix OS, on eclipse. I use wstring because I had another problem before with € sign and discovered that I couldn't use string with it but wstring was the solution. My main problem was to decode it (which I'm trying still ) – Megasa3 Sep 08 '15 at 15:01
  • wstring is supported on * nix, it's just not very popular to use because many API methods take utf-8 (char* or std::string) as input. But simply request the Dephi guy to give you UTF-32 and the above should work. – Lasse Reinhold Sep 08 '15 at 15:04
  • well the problem is that, as I said in the comments of my question, that 'guy' refuses to do any change so I had to see if I can make this work by any way.... many times I have to demonstrate that I have proved all sane and insane methods to let them know they have to make a change before they believe me :( – Megasa3 Sep 08 '15 at 16:30