6

wmemcpy appears to perform the same operations as memcpy but accepts wchar_t* instead of void*. How is its existence justified if these two code snippets should have the same behaviour? Does it have a useful purpose?

memcpy(dest, wchar_array, sizeof(wchar_array));

And

wmemcpy(dest, wchar_array, sizeof(wchar_array) / sizeof(wchar_t));
Jordan Melo
  • 1,193
  • 7
  • 26
  • because human beings like to create stuff... not necessarily good stuff... But for some, just building a huge junk collection gives them self worth. – AbstractDissonance Apr 07 '16 at 21:20
  • 3
    A weak reason may be that when you have an array of `wchar_t`, `wmemcpy` may be more efficient because it can assume suitable alignment. – Kerrek SB Apr 07 '16 at 21:22
  • @Lashlane `printf("%zu\n", sizeof(wchar_t));` prints 4 in my Debian virtual machine. – jdarthenay Apr 07 '16 at 21:32
  • @jdarthenay yep, size of wchar is implementation defined, msvc uses 2 bytes though – Iłya Bursov Apr 07 '16 at 21:33
  • 2
    @Lashane: The term "Word" is not synonym for "2 bytes". As is "1 byte" not 8 bits. – too honest for this site Apr 07 '16 at 21:35
  • @Olaf I know this, but 99% of the time - word is 2/4 bytes, and byte is 8 bits. Its much easier to live in real world without nitpicking every word. – Iłya Bursov Apr 07 '16 at 21:41
  • 1
    @Lashane: There is little use in using wrong terms without need. The word "octet" is only one letter more to type. And stating "word == 2 bytes" was just a bad idea (you apparently noticed yourself). "I know this" - Problem is this is read by beginners mostly who take such statements as universally valid. Just see the questions here with errors resulting from users making wrong assumptions about sizes. – too honest for this site Apr 07 '16 at 21:48
  • 1
    @JordanMelo I'd never heard of wmemcpy until you asked about it just now. I agree that its existence seems unjustified. Me, I'd never use it. – Steve Summit Apr 07 '16 at 21:50
  • 3
    @Olaf You're fighting a losing battle on "byte". It's a word with multiple meanings (just like "hacker"). For at least 99% of speakers, in common usage, "byte" means "8 bits". – Steve Summit Apr 07 '16 at 21:53
  • @SteveSummit can you point to a single counter-example where "byte" means anything other than 8 bits? I don't think the uncommon CPUs that use non-8-bit smallest addressable units use the term "byte" for that. – Mark Ransom Apr 07 '16 at 21:55
  • @Olaf the "W" in this case stands for *wide*, not *word*. – Mark Ransom Apr 07 '16 at 21:57
  • @MarkRansom http://stackoverflow.com/questions/5516044/system-where-1-byte-8-bit – Iłya Bursov Apr 07 '16 at 21:58
  • @MarkRansom: The original comment was deleted, you missed context. – too honest for this site Apr 07 '16 at 21:59
  • 1
    @SteveSummit: Yes, that's because people who should know better invoking the impressing "byte <=> 8 bits". And now you tell me correcting this is wrong, because - what? Call me old fashioned, but I honestly think using precise terminology is vital in engineering at least (and could help in other fields, too. I'm apparently lucky my customers seem to appreciate my pedantry. – too honest for this site Apr 07 '16 at 22:04
  • @Lashane I know there are lots of CPUs where a *char* isn't 8 bits, but I don't recall the term *byte* being used in that context. My opinion is that references to "byte" in that answer are misrepresentations. – Mark Ransom Apr 07 '16 at 22:04
  • @MarkRansom I have it on good authority that, once upon a time, there were indeed machines with 7- and 9-bit bytes. But my point was that, today, making the assumption that bytes are 8 bits is so popular that (to a descriptivist like me) it's accurate, per common usage, no matter how much the pedants rail against it. – Steve Summit Apr 07 '16 at 22:05
  • @MarkRansom one of the anwers states `The size of a byte was at first selected to be a multiple of existing teletypewriter codes, particularly the 6-bit codes used by the U.S. Army (Fieldata) and Navy. ` – Iłya Bursov Apr 07 '16 at 22:05
  • 1
    @MarkRansom: See the C standard. The terms "byte" and `char` are used as synonyms. There is a reason the network standards use "octets". – too honest for this site Apr 07 '16 at 22:05
  • @Lashane: IIRC, TTYs used a 5 bit code with shift-codes (which is one reason older UARTs allowed for symbol data length of 5..n bits). But that was before my time, so it is just hearsay. – too honest for this site Apr 07 '16 at 22:07
  • @Lashane again, my argument isn't about the size of a character, or the size of the smallest addressable unit, it's about the definition of the word "byte". I don't think the term was historically used to refer to any quantity other than 8 bits. – Mark Ransom Apr 07 '16 at 22:08
  • 1
    @MarkRansom IBM 7030 Stretch had variable length bytes, so byte was 1-8 bits – Iłya Bursov Apr 07 '16 at 22:12
  • @Lashane your comment led me to [COMPUTER USAGE COMMINUQUÉ Vol. 2 No. 3](http://archive.computerhistory.org/resources/text/Computer_Usage_Company/cuc.communique_vol2no3.1963.102651922.pdf) from 1963 which indeed uses the term "byte" in the context you describe. I'll shut up now. Thank you very much for the concrete example. – Mark Ransom Apr 07 '16 at 22:44

2 Answers2

2

I guess it's largely about API symmetry, but also, it allows more easily writing code which can work with both wide character and normal strings (switched over by a preprocessor define or similar).

Essentially, if you want your code to work with char, you #define your copy function as memcpy. For wchar_t, you define it as wmemcpy instead. Your size argument is just the number of characters (either char or wchar_t); remember that the argument isn't necessarily a fixed size array, so using sizeof isn't always an option.

The Win32 API for instance makes use of a similar strategy: If you define the UNICODE preprocessor symbol, most functions resolve to their wide-character version (suffixed with W) but otherwise they resolve to the "narrow" character version (suffixed with A); TCHAR is defined as either char or wchar_t accordingly; etc. The upshot is you can write code which works with either wide or regular characters fairly easily.

Of course, this is in no way necessary; but then, the standard C library isn't necessarily supposed to be absolutely minimal. You could argue that calloc is superfluous since you can always use malloc and then memset, for instance; it still exists, however.

davmac
  • 20,150
  • 1
  • 40
  • 68
  • 1
    I'm not sure this makes sense, because I often need to `memcpy` things like integers or floats. Why not have a `imemcpy()` or `fmemcpy()`? Why does `wchar_t` get its own function? `memcpy` works on the basis that `char` is a single byte, not that it's a character; `wchar_t` doesn't have that special property, though. – Cornstalks Apr 07 '16 at 21:28
  • @Cornstalks but ints and floats aren't (generally) used to hold characters. `chart` and `wchar_t` both are; the difference bewteen them is mainly their width/range. Because of this, it makes sense that you might want to write code that works with either (or at least, which is switchable to support either). – davmac Apr 07 '16 at 21:33
  • 1
    I guess what I'm trying to say is that I don't think `memcpy` is geared towards characters/strings. If it was, I'd expect it to take a `char*` instead of a `void*`. – Cornstalks Apr 07 '16 at 21:38
  • @Cornstalks it's not _geared_ towards strings, but you can still use it copy strings; you can pass it a `char *` without issue. I'm not suggesting this should generally be done, but it can be done, and probably has been done. – davmac Apr 07 '16 at 21:44
  • I could easily `#define` my copy function to something like `#define copy(dest,src,size) memcpy(dest,src,(size)*2) ` instead of `wmemcpy(dest,src,size)` for `wchar_t` and get the same effect. It's a fairly trivial transformation. But that's a good point for why someone might have wanted it. – Jordan Melo Apr 07 '16 at 22:28
  • @JordanMelo sure, you could. I don't think the existence of `wmemcpy` is really necessary; it mainly just rounds out the w* functions so that most of the regular mem* functions have wide-character equivalents. But it's certainly not causing a problem by existing. :) – davmac Apr 07 '16 at 22:54
2

In addition to davmac's answer about API symmetry and having the size of an array not always granted, it should be emphasized that the third argument of wmemcpy refers to number of elements to copy (rather than bytes).

If you work with wchar_t objects and handle them with other functions from <wchar.h>, it may facilitate matters. wcslen for instance returns the C wide string length in terms of wchar_t elements, and wcschr and wcsrchr return wchar_t *, thus using them to do some pointer arithmetic also keeps you in the "realm" of number-of-elements.

P.S. If the size of the minimal wchar_t array is given as implied in your example, using wmemcpy may result in more elegant code than that sizeof(wchar_array) you used:

#define SIZE 40
wchar_t wchar_array[SIZE];
// ...
wmemcpy(dest, wchar_array, SIZE);
R.G.
  • 116
  • 4