3

I am trying to write server that will communicate with any standard client that can make socket connections (e.g. telnet client)

It started out as an echo server, which of course did not need to worry about network byte ordering.

I am familiar with ntohs, ntohl, htons, htonl functions. These would be great by themselves if I were transfering either 16 or 32-bit ints, or if the characters in the string being sent were multiples of 2 or 4 bytes.

I'd like create a function that operates on strings such as:

str_ntoh(char* net_str, char* host_str, int len)
{
    uint32_t* netp, hostp;
    netp = (uint32_t*)&net_str;
    for(i=0; i < len/4; i++){
         hostp[i] = ntoh(netp[i]);
    }
}

Or something similar. The above thing assumes that the wordsize is 32-bits. We can't be sure that the wordsize on the sending machine is not 16-bits, or 64-bits right?

For client programs, such as telnet, they must be using hton* before they send and ntoh* after they receive data, correct?

EDIT: For the people that thing because 1-char is a byte that endian-ness doesn't matter:

int main(void)
{
    uint32_t a = 0x01020304;
    char* c = (char*)&a;
printf("%x %x %x %x\n", c[0], c[1], c[2], c[3]);

}

Run this snippet of code. The output for me is as follows:

$ ./a.out
  4 3 2 1

Those on powerPC chipsets should get '1 2 3 4' but those of us on intel chipset should see what I got above for the most part.

Derrick
  • 2,356
  • 5
  • 32
  • 43
  • 1
    I assume it depends on the size of each character. You don't need to concern yourself about it if only one byte is used per character. – Skurmedel Dec 19 '09 at 21:24
  • Related: http://stackoverflow.com/questions/526030/byte-order-with-a-large-array-of-characters-in-c http://stackoverflow.com/questions/1568057/ascii-strings-and-endianness – dmckee --- ex-moderator kitten Dec 19 '09 at 21:30
  • Thanks James gonna fix that right now. – Derrick Dec 20 '09 at 00:05
  • 1
    If you cast a uint32_t* into a char* then yes the order matters. What "we who think endianness doesn't matter for char*" are saying is if you declare an array of chars, and work only with chars and never larger types, you don't have to swap, it's only multi-byte integers for which it matters (uint32_t is a multi-byte integer so that's why your example behaves the way it does). – asveikau Dec 20 '09 at 00:22
  • That's the point. There is more than one char, more than one byte. So byte ordering matters. If your argument is that byte order doesn't matter for THIS application, I beg to differ. Have you done a at least an echo client or server in C? – Derrick Dec 20 '09 at 00:44
  • 5
    Your example code does not use a string, it uses the memory representation of a `uint32_t`. There's a difference. If you instead do `char *c = "foo";` then the output is 66, 6f, 6f, 0. No matter what the endian-ness of the machine. I haven't written an echo client/server, but I have written an HTTP client and server in C. You do not need to byte-swap strings. – Steve Jessop Dec 20 '09 at 01:08
  • 1
    @Derrick: Stop being insulting. Instead why don't you try it? char foo[] = "Hello, World"; write(fd, foo, sizeof(foo)); -- Do the write() on an Intel machine and the read() on a PowerPC machine and witness how it works. Otherwise I suggest you re-read some of the comments here. – asveikau Dec 20 '09 at 01:09
  • @asveikau My apologies if it came off that way, I don't want to seem insulting. I do realize that everyone is obviously here to help. I'd love to do that, but I have no access to a PowerPC machine. @Steve please note that using a (char*) to point to a uint32_t does not change what's inside of memory. I've read them and they all say don't worry about it. At this point, for my assignment, I will ignore this byte-order. I'm sure I won't run into any problems by following this advice, but because all of it is on intel machines. – Derrick Dec 20 '09 at 03:08
  • @Steve the "hae you written an echo server" question was to see if people had actually had experience with it. So writing an http server means you definitely do. I still am curious if anyone here has been able to test sending and recieving char arrays w/o conversion between different endian based systems. – Derrick Dec 20 '09 at 03:14
  • 1
    @Derrick: a sample server: http://stackoverflow.com. It returns strings and different clients don't need to do anything special to detect what endianness the server had. – ysth Dec 20 '09 at 06:26
  • @Derrick: If you examine what your reasoning implies, then you will (incorrectly) come to the conclusion that LE machines deal with strings that are in the reverse order of strings that BE machines deal with. This is obviously wrong. You may want to read the "Misconceptions" section at: http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html. By the way, your arrogance is hilarious. –  Dec 20 '09 at 19:33
  • I upvoted this. It really is a good question, and well explained, even if it's based on a bit of a misconception, and even if the author seems overly skeptical. But on the other hand, answers have been wrong before, so it is probably good to be a bit skeptical. – Thomas Padron-McCarthy Dec 24 '09 at 16:13

4 Answers4

18

Maybe I'm missing something here, but are you sending strings, that is, sequences of characters? Then you don't need to worry about byte order. That is only for the bit pattern in integers. The characters in a string are always in the "right" order.

EDIT:

Derrick, to address your code example, I've run the following (slightly expanded) version of your program on an Intel i7 (little-endian) and on an old Sun Sparc (big-endian)

#include <stdio.h>
#include <stdint.h> 

int main(void)
{
    uint32_t a = 0x01020304;
    char* c = (char*)&a;
    char d[] = { 1, 2, 3, 4 };
    printf("The integer: %x %x %x %x\n", c[0], c[1], c[2], c[3]);
    printf("The string:  %x %x %x %x\n", d[0], d[1], d[2], d[3]);
    return 0;
}

As you can see, I've added a real char array to your print-out of an integer.

The output from the little-endian Intel i7:

The integer: 4 3 2 1
The string:  1 2 3 4

And the output from the big-endian Sun:

The integer: 1 2 3 4
The string:  1 2 3 4

Your multi-byte integer is indeed stored in different byte order on the two machines, but the characters in the char array have the same order.

Thomas Padron-McCarthy
  • 27,232
  • 8
  • 51
  • 75
  • strings are sequences of characters yes. Sending this data over the network between two computers of the same endian-ness would not matter. however, if you did not do any byte-order conversions, and did something like char* str = "abcd"; and sent it on a little endian machine, then received on a big-endian, when you addressed str[0] it would be d, and not a. http://stackoverflow.com/questions/526030/byte-order-with-a-large-array-of-characters-in-c – Derrick Dec 19 '09 at 21:35
  • 10
    @Derrick: No, that's wrong. With strings, the first character will always be in the first position, and so on. It's not like multi-byte integers. – Thomas Padron-McCarthy Dec 19 '09 at 21:45
  • 3
    @Derrick: To illustrate Thomas's point... Let's say you had an array of integers, { 0xaabb, 0xccdd }. Take this to a different endianness and the order of bytes within the integer get warped into 0xbbaa, 0xddcc. However the order of the integers within the array doesn't change. So it's { 0xbbaa, 0xddcc } and not { 0xddcc, 0xbbaa }. Now imagine these are 8 bit integers instead of 16. If you had an array {0xaa, 0xbb, 0xcc, 0xdd}, within an array element (0xaa) there are no bytes to swap, it is a single byte. And you wouldn't swap the individual bytes because that changes the order of the array. – asveikau Dec 19 '09 at 22:03
  • really? Run this: int main(void) { uint32_t a = 0x01020304; char* c = (char*)&a; printf("%x %x %x %x\n", c[0], c[1], c[2], c[3]); } When I run it I get 4 3 2 1 – Derrick Dec 20 '09 at 00:04
  • 10
    @Derek - That's a 32-bit integer, not an array of bytes. If you declare: char a[] = {1,2,3,4}; it will always be in the same order. – asveikau Dec 20 '09 at 00:18
4

With your function signature as posted you don't have to worry about byte order. It accepts a char*, that can only handle 8-bit characters. With one byte per character, you cannot have a byte order problem.

You'd only run into a byte order problem if you send Unicode, either in UTF16 or UTF32 encoding. And the endian-ness of the sending machine doesn't match the one of the receiving machine. The simple solution for that is to use UTF8 encoding. Which is what most text is sent as across networks. Being byte oriented, it doesn't have a byte order issue either. Or you could send a BOM.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
2

If you'd like to send them as an 8-bit encoding (the fact that you're using char implies this is what you want), there's no need to byte swap. However, for the unrelated issue of non-ASCII characters, so that the same character > 127 appears the same on both ends of the connection, I would suggest that you send the data in something like UTF-8, which can represent all unicode characters and can be safely treated as ASCII strings. The way to get UTF-8 text based on the default encoding varies by the platform and set of libraries you're using.

If you're sending 16-bit or 32-bit encoding... You can include one character with the byte order mark which the other end can use to determine the endianness of the character. Or, you can assume network byte order and use htons() or htonl() as you suggest. But if you'd like to use char, please see the previous paragraph. :-)

asveikau
  • 39,039
  • 2
  • 53
  • 68
1

It seems to me that the function prototype doesn't match its behavior. You're passing in a char *, but you're then casting it to uint32_t *. And, looking more closely, you're casting the address of the pointer, rather than the contents, so I'm concerned that you'll get unexpected results. Perhaps the following would work better:

arr_ntoh(uint32_t* netp, uint32_t* hostp, int len)
  {
  for(i=0; i < len; i++)
    hostp[i] = ntoh(netp[i]);
  }

I'm basing this on the assumption that what you've really got is an array of uint32_t and you want to run ntoh() on all of them.

I hope this is helpful.