1

Upon decompiling various programs (which I do not have the source for), I have found some interesting sequences of code. A program has a c-string (str) defined in the DATA section. In some function in the TEXT section, a part of that string is set by moving a hexadecimal number to a position in the string (simplified Intel assembly MOV str,0x006f6c6c6568). Here is an snippet in C:

#include <stdio.h>

static char str[16];

int main(void)
{
    *(long *)str = 0x006f6c6c6568;
    printf("%s\n", str);
    return 0;
}

I am running macOS, which uses little endian, so 0x006f6c6c6568 translates to hello. The program compiles with no errors or warnings, and when run, prints out hello as expected. I calculated 0x006f6c6c6568 by hand, but I was wondering if C could do it for me. Something like this is what I mean:

#include <stdio.h>

static char str[16];

int main(void)
{
    // *(long *)str = 0x006f6c6c6568;
    *(str+0) = "hello";
    printf("%s\n", str);
    return 0;
}

Now, I would not like to treat "hello" as a string literal, it might be treated like this for little-endian:

    *(long *)str = (long)(((long)'h') |
                          ((long)'e' << 8) |
                          ((long)'l' << 16) |
                          ((long)'l' << 24) |
                          ((long)'o' << 32) |
                          ((long)0 << 40));

Or, if compiled for a big-endian target, this:

    *(long *)str = (long)(((long) 0  << 16) |
                          ((long)'o' << 24) |
                          ((long)'l' << 32) |
                          ((long)'l' << 40) |
                          ((long)'e' << 48) |
                          ((long)'h' << 56));

Thoughts?

ramkaz99
  • 188
  • 6
  • The value of a string literal is a pointer to the first element, not the contents of a string. You need to use `strcpy(str, "hello")` to copy the contents. – Barmar Apr 23 '22 at 23:22
  • @Barmar is it possible to tell C to not treat `"hello"` as a string literal and instead translate its representation into an `int` or `long`? – ramkaz99 Apr 23 '22 at 23:43
  • I don't think so. – Barmar Apr 23 '22 at 23:59
  • 1
    @ramkaz99 - what are you really trying to do? – selbie Apr 24 '22 at 00:01
  • @selbie to put it in new words, is there some built-in C function/method/preprocessor function/operator/etc. that can convert an 8 character string into its raw hexadecimal representation of `long` type? – ramkaz99 Apr 24 '22 at 00:08
  • https://godbolt.org/z/3T14vqaos Can't think how to get closer than those. – Erik Eidt Apr 24 '22 at 00:35
  • @ErikEidt That makes sense. I was wondering if there was a way such that the compiler could compute the right representation for the target endianness at compile time instead of having to change the sequence manually if compiling the code for different endianness targets. – ramkaz99 Apr 24 '22 at 00:54
  • I don't think so. String data is naturally endian-free, but integers are naturally endian, so there's a bit of a mismatch that I don't know a way around. – Erik Eidt Apr 24 '22 at 01:04
  • 1
    note that `*(long *)str = 0x006f6c6c6568;` violates strict aliasing rule and invokes UB. Besides the string isn't necessarily aligned and that also invokes UB – phuclv Apr 24 '22 at 02:58

3 Answers3

3

is there some built-in C function/method/preprocessor function/operator/etc. that can convert an 8 character string into its raw hexadecimal representation of long type

I see you've already accepted an answer, but I think this solution is easier to understand and probably what you want.

Copying the string bytes into a 64-bit integer type is all that's needed. I'm going to use uint64_t instead of long as that's guaranteed to be 8 bytes on all platforms. long is often only 4 bytes.

#include <string.h>
#include <stdint.h>
#include <inttypes.h>

uint64_t packString(const char* str) {
    uint64_t value = 0;
    size_t copy = str ? strnlen(str, sizeof(value)) : 0; // copy over at most 8 bytes

    memcpy(&value, str, copy);
    return value;
}

Example:

int main() {
    printf("0x%" PRIx64 "\n", packString("hello"));
    return 0;
}

Then build and run:

$:~/code/packString$ g++ main.cpp -o main

$:~/code/packString$ ./main

0x6f6c6c6568

selbie
  • 100,020
  • 15
  • 103
  • 173
  • instead of `char value[8]` you should use `uint64_t value` and that'll avoid strict aliasing and misalignment issues – phuclv Apr 24 '22 at 03:14
  • @phuclv - good suggestion. Fixed. – selbie Apr 24 '22 at 03:16
  • Interesting use of [`strncpy`](https://en.cppreference.com/w/c/string/byte/strncpy) to fill `value` with zeros out to 8 bytes even if the input string is shorter. Normally this makes `strncpy` total garbage for performance (unnecessary stores out to the end of a large buffer), but it's actually a feature here. – Peter Cordes Apr 24 '22 at 05:32
  • If you don't need that, though, `memcpy(&value, str, sizeof(value))` will reliably inline to a single 8-byte load instruction (or `mov`-immediate from a constant source here), at least on ISAs that support unaligned loads. https://godbolt.org/z/rn6P6ar7q shows both ways: with a string literal arg, `packString` still inlines to mov-immediate to put a constant in a register, no actual loading or function calls. – Peter Cordes Apr 24 '22 at 05:33
  • Your size arg should be `sizeof(value)` - `uint64_t` is guaranteed to be 64 bits exactly with no padding (if it exists at all), but `CHAR_BIT` is *not* guaranteed to be `8`. A DSP with 16-bit `char` would copy too many bytes past the end of `value`. (And hopefully give a compiler warning.) `sizeof(value)` also lets you change to a different size type without having to change the hard-coded size to match. – Peter Cordes Apr 24 '22 at 05:35
  • @PeterCordes - I appreciate the feedback. While `strncpy` may not be as efficient as `memcpy`, it's simpler to demonstrate how to copy *at most* 8 bytes with it - and doesn't create undefined behavior if the caller passed in a string that is not exactly 8 chars in length. – selbie Apr 24 '22 at 05:58
  • @PeterCordes - We live in an 8-bit byte world. And while there are legacy platforms and possibly future platforms that will have a different word size, trying to account for that would only muddy up the answer. – selbie Apr 24 '22 at 06:00
  • 1
    Right, that's why is a good answer that I upvoted! It's fine, and compiles efficiently with short string literals so it's great for that use-case. It's worth *mentioning* memcpy as an option, at least in comments. – Peter Cordes Apr 24 '22 at 06:00
  • Part of the argument for `sizeof(value)` is to avoid hard-coding the size, in case you want to change it to `uint32_t` for 4 characters at some point. Also, I'm not worried about 9-bit-byte machines, I'm worried about modern DSPs that aren't byte-addressable at all, so C implementations on them use a `char` as wide as `int`, like 16, 24, or maybe 32 bits. (In that case other things may start to break down, like how string literals are packed, IDK, but at least you avoid UB). I don't think it's harder to read, and you can/should leave the *comment* describing it as "copy at most 8 bytes." – Peter Cordes Apr 24 '22 at 06:06
  • Or to put it another way, `sizeof` and `strncpy` use units of C `char`, not necessarily bytes. – Peter Cordes Apr 24 '22 at 06:08
  • @PeterCordes - Updated the answer in a way that I think works for both of us. – selbie Apr 24 '22 at 06:59
  • This seems less readable. I don't think this is better, and wasn't arguing for it. I was saying the use-case for `memcpy` is when you can just load a full 8 bytes without needing a length check. This does still work, thanks to the zero-init of `value`, and remarkably still optimizes down to `movabs rax, 8031924123371070824` for a string literal (with gcc and clang -O2 or -O3: https://godbolt.org/z/o4x3Gar61), but for a non-constant arg it expands into way larger asm for GCC, I guess inlining the various possible memcpy sizes – Peter Cordes Apr 24 '22 at 07:07
  • @PeterCordes - I made a few final tweaks to simplify. I'm done. – selbie Apr 24 '22 at 07:07
  • I liked the `strncpy` version; the only change I was suggesting to it was `sizeof(value)`, which you've done. You've also added a check for `str == NULL`. This version is more readable than the middle version, using `strnlen` is a good idea, but I still liked the `strncpy` version better. Adding an `if(!str) return 0;` to it is fine if you want that: https://godbolt.org/z/aEK5x4hha – Peter Cordes Apr 24 '22 at 07:11
  • I liked the strncpy idea enough to post my own answer using it (and a bunch of discussion of bytes and stuff.) Hopefully some future readers will find each of our answers useful. – Peter Cordes Apr 24 '22 at 10:55
1

TL:DR: you want strncpy into a uint64_t. This answer is long in an attempt to explain the concepts and how to think about memory from C vs. asm perspectives, and whole integers vs. individual chars / bytes. (i.e. if it's obvious that strlen/memcpy or strncpy would do what you want, just skip to the code.)


If you want to copy exactly 8 bytes of string data into an integer, use memcpy. The object-representation of the integer will then be those string bytes.

Strings always have the first char at the lowest address, i.e. a sequence of char elements so endianness isn't a factor because there's no addressing within a char. Unlike integers where it's endian-dependent which end is the least-significant byte.

Storing this integer into memory will have the same byte order as the original string, just like if you'd done memcpy to a char tmp[8] array instead of a uint64_t tmp. (C itself doesn't have any notion of memory vs. register; every object has an address except when optimization via the as-if rule allows, but assigning to some array elements can get a real compiler to use store instructions instead of just putting the constant in a register. So you could then look at those bytes with a debugger and see they were in the right order. Or pass a pointer to fwrite or puts or whatever.)

memcpy avoids possible undefined behaviour from alignment and strict-aliasing violations from *(uint64_t*)str = val;. i.e. memcpy(str, &val, sizeof(val)) is a safe way to express an unaligned strict-aliasing safe 8-byte load or store in C, like you could do easily with mov in x86-64 asm.
(GNU C also lets you typedef uint64_t aliasing_u64 __attribute__((aligned(1), may_alias)); - you can point that at anything and read/write through it safely, just like with an 8-byte memcpy.)

char* and unsigned char* can alias any other type in ISO C, so it's safe to use memcpy and even strncpy to write the object-representation of other types, especially ones that have a guaranteed format / layout like uint64_t (fixed width, no padding, if it exists at all).


If you want shorter strings to zero-pad out to the full size of an integer, use strncpy. On little-endian machines it's like an integer of width CHAR_BIT * strlen() being zero-extended to 64-bit, since the extra zero bytes after the string go into the bytes that represent the most-significant bits of the integer.

On a big-endian machines, the low bits of the value will be zeros, as if you left-shifted that "narrow integer" to the top of the wider integer. (And the non-zero bytes are in a different order wrt. each other).
On a mixed-endian machine (e.g. PDP-11), it's less simple to describe.

strncpy is bad for actual strings but exactly what we want here. It's inefficient for normal string-copying because it always writes out to the specified length (wasting time and touching otherwise unused parts of a long buffer for short copies). And it's not very useful for safety with strings because it doesn't leave room for a terminating zero with large source strings.
But both of those things are exactly what we want/need here: it behaves like memcpy(val, str, 8) for strings of length 8 or higher, but for shorter strings it doesn't leave garbage in the upper bytes of the integer.

Example: first 8 bytes of a string

#include <string.h>
#include <stdint.h>

uint64_t load8(const char* str)
{
    uint64_t value;
    memcpy(&value, str, sizeof(value));     // load exactly 8 bytes
    return value;
}

uint64_t test2(){
    return load8("hello world!");  // constant-propagation through it
}

This compiles very simply, to one x86-64 8-byte mov instruction using GCC or clang on the Godbolt compiler explorer.

load8:
        mov     rax, QWORD PTR [rdi]
        ret

test2:
        movabs  rax, 8031924123371070824  # 0x6F77206F6C6C6568 
          # little-endian "hello wo", note the 0x20 ' ' byte near the top of the value
        ret

On ISAs where unaligned loads just work with at worst a speed penalty, e.g. x86-64 and PowerPC64, memcpy reliably inlines. But on MIPS64 you'd get a function call.

# PowerPC64 clang(trunk) -O3
load8:
        ld 3, 0(3)            # r3 = *r3   first arg and return-value register
        blr

BTW, I used sizeof(value) instead of 8 for two reasons: first so you can change the type without having to manually change a hard-coded size.

Second, because a few obscure C implementations (like modern DSPs with word-addressable memory) don't have CHAR_BIT == 8. Often 16 or 24, with sizeof(int) == 1 i.e. the same as a char. I'm not sure exactly how the bytes would be arranged in a string literal, like whether you'd have one character per char word or if you'd just have an 8-letter string in fewer than 8 chars, but at least you wouldn't have undefined behaviour from writing outside a local variable.

Example: short strings with strncpy

// Take the first 8 bytes of the string, zero-padding if shorter
// (on a big-endian machine, that left-shifts the value, rather than zero-extending)
uint64_t stringbytes(const char* str)
{
    // if (!str)  return 0;   // optional NULL-pointer check
    uint64_t value;           // strncpy always writes the full size (with zero padding if needed)
    strncpy((char*)&value, str, sizeof(value)); // load up to 8 bytes, zero-extending for short strings
    return value;
}

uint64_t tests1(){
    return stringbytes("hello world!");
}
uint64_t tests2(){
    return stringbytes("hi");
}
tests1():
        movabs  rax, 8031924123371070824     # same as with memcpy
        ret
tests2():
        mov     eax, 26984        # 0x6968 = little-endian "hi"
        ret

The strncpy misfeatures (that make it not good for what people wish it was designed for, a strcpy that truncates to a limit) are why compilers like GCC warn about these valid use-cases with -Wall. That and our non-standard use-case, where we want truncation of a longer string literal just to demo how it would work. That's not strncpy's fault, but the warning about passing a length limit the same as the actual size of the destination is.

n function 'constexpr uint64_t stringbytes2(const char*)',
    inlined from 'constexpr uint64_t tests1()' at <source>:26:24:
<source>:20:12: warning: 'char* strncpy(char*, const char*, size_t)' output truncated copying 8 bytes from a string of length 12 [-Wstringop-truncation]
   20 |     strncpy(u.c, str, 8);
      |     ~~~~~~~^~~~~~~~~~~~~
<source>: In function 'uint64_t stringbytes(const char*)':
<source>:10:12: warning: 'char* strncpy(char*, const char*, size_t)' specified bound 8 equals destination size [-Wstringop-truncation]
   10 |     strncpy((char*)&value, str, sizeof(value)); // load up to 8 bytes, zero-extending for short strings
      |     ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Big-endian examples: PowerPC64

Strangely, GCC for MIPS64 doesn't want to inline strnlen, and PowerPC can more efficiently construct constants larger than 32-bit anyway. (Fewer shift instructions, as oris can OR into bits [31:16], i.e. OR a shifted immediate.)

uint64_t foo = tests1();
uint64_t bar = tests2();

Compiling as C++ to allow function return values as initializers for global vars, clang (trunk) for PowerPC64 compiles the above with constant-propagation into initialized static storage in .data for these global vars, instead of calling a "constructor" at startup to store into the BSS like GCC unfortunately does. (It's weird because GCC's initializer function just constructs the value from immediates itself and stores.)

foo:
        .quad   7522537965568948079             # 0x68656c6c6f20776f
                                      # big-endian "h e l l o   w o"

bar:
        .quad   7523544652499124224             # 0x6869000000000000
                                      # big-endian "h i \0\0\0\0\0\0"

The asm for tests1() can only construct a constant from immediates 16 bits at a time (because an instruction is only 32 bits wide, and some of that space is needed for opcodes and register numbers). Godbolt

# GCC11 for PowerPC64 (big-endian mode, not power64le)  -O3 -mregnames 
tests2:
        lis %r3,0x6869    # Load-Immediate Shifted, i.e. big-endian "hi"<<16
        sldi %r3,%r3,32   # Shift Left Doubleword Immediate  r3<<=32 to put it all the way to the top of the 64-bit register
          # the return-value register holds 0x6869000000000000
        blr               # return

tests1():
        lis %r3,0x6865        # big-endian "he"<<16
        ori %r3,%r3,0x6c6c    # OR Immediate producing "hell"
        sldi %r3,%r3,32       # r3 <<= 32
        oris %r3,%r3,0x6f20   # r3 |=  "o " << 16
        ori %r3,%r3,0x776f    # r3 |=  "wo"
          # the return-value register holds 0x68656c6c6f20776f
        blr

I played around a bit with getting constant-propagation to work for an initializer for a uint64_t foo = tests1() at global scope in C++ (C doesn't allow non-const initializers in the first place) to see if I could get GCC to do what clang does. No success so far. And even with constexpr and C++20 std::bit_cast<uint64_t>(struct_of_char_array) I couldn't get g++ or clang++ to accept uint64_t foo[stringbytes2("h")] to use the integer value in a context where the language actually requires a constexpr, rather than it just being an optimization. Godbolt.

IIRC std::bit_cast should be able to manufacture a constexpr integer out of a string literal but there might have been some trick I'm forgetting; I didn't search for existing SO answers yet. I seem to recall seeing one where bit_cast was relevant for some kind of constexpr type-punning.


Credit to @selbie for the strncpy idea and the starting point for the code; for some reason they changed their answer to be more complex and avoid strncpy, so it's probably slower when constant-propagation doesn't happen, assuming a good library implementation of strncpy that uses hand-written asm. But either way still inlines and optimizes away with a string literal.

Their current answer with strnlen and memcpy into a zero-initialized value is exactly equivalent to this in terms of correctness, but compiles less efficiently for runtime-variable strings.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • why is `strncpy` more efficient than `value = 0; memcpy(&value, str, sizeof(value));`? – Noah Apr 24 '22 at 18:45
  • I didn't know compilers optimized that far down. Thanks for the explanation! – ramkaz99 Apr 24 '22 at 21:03
  • 1
    Changed to accepted answer, this is more of what I was looking for in terms of solution and explanation. – ramkaz99 Apr 24 '22 at 21:23
  • 1
    @Noah: It's not more efficient than a fixed 8-byte memcpy. (Which would make `value=0` redundant.) It should be at least as efficient as `value=0; memcpy(&value, str, strnlen(str, 8))`, hopefully better if there's anything to be gained from having the alignment checking already done and data already loaded once. (Especially the way GCC compiles it, inlining the memcpy with branching on size, bloating this callsite vs. glibc already containing hand-written asm for strncpy. Although likely cold.) – Peter Cordes Apr 24 '22 at 22:11
  • @Noah: The really useful thing for this strncpy would be to inline taking advantage of the known 8-byte store width, and do a qword load and `bzhi` (if alignment checking shows that wouldn't cross a page or maybe cache line). Or some bithack with `pcmpeqb v,0` that generates a mask for `pandn` to zero the high bytes of an 8-byte load to avoid needing `pmovmskb` / `bsr` to get an actual count. – Peter Cordes Apr 24 '22 at 22:14
  • @Noah: Let me know if there's something I should reword that wrongly implied that `strncpy` would be would be more efficient than a fixed memcpy. – Peter Cordes Apr 24 '22 at 22:17
  • @PeterCordes err meant `value = 0; strcpy(&value, str)` or `value = 0; memcpy(&value, str, strlen(str)`. But yeah inline `bzhi` with just 8-byte load from `str` would be ideal. You could do `pcmpeqb v, 0; mov tmp, v; paddq v, -1; pxor v, tmp` to get your mask. Think there may be a faster way. There was a question a while about about how to zero all bits in `xmm` after first zero byte. Had a pretty good solution for that that may be faster but can't find the question. – Noah Apr 24 '22 at 23:14
  • @Noah: We want this to work even if the input string is longer than 8 bytes, just taking the first 8. Otherwise yes we could `strcpy`. I looked at glibc strncpy, which ends up using https://code.woboq.org/userspace/glibc/sysdeps/x86_64/multiarch/strcpy-avx2.S.html with `USE_AS_STRNCPY` defined; I don't think it does any masking of the load containing the last byte, instead just storing it and separately zero-filling, unfortunately. So maybe I'm wrong about it being more efficient overall for runtime-variable input strings, but at least it doesn't bloat the callsite. – Peter Cordes Apr 25 '22 at 00:04
  • @PeterCordes Updating `st{r|p}{n}{cpy|cat}` in my todo list (doing `memrchr` first. Personally disdain the interface of `strncpy` that I've been avoiding it). Thats an interesting optimization although it will need an extra branch to ensure writing full vector width stays with `n`. Not 100% certain it would be worth unless its perfect fit for the zero fill. If `remaining_n` is small it will be 2x overlapping stores on the tail anyways so we are just eating a branch. If `remaining_n` is large it may save an iteration of a memsetzero loop ~ `(VEC_SIZE / 2) / (LOOP_UNROLL * VEC_SIZE)` times. \ – Noah Apr 25 '22 at 00:49
  • @PeterCordes But not sure it would turn out worth it. Will give it a show though. I like the concept. (similiar to write combining `char a; ...; *(ptr + 0) = a; *(ptr + 1) = 0` into just a `mozwl a8, tmp16; movw tmp16, ptr` (I have a LLVM patch for that actually but haven't tested it robustly and being lazy). – Noah Apr 25 '22 at 00:54
  • 1
    @Noah: Probably want to get some data on real-world usage of `strncpy`, to see if its commonly used as an inefficient way to avoid buffer overflows into largeish buffers. In that case the actual copy length is probably significantly shorter than size limit, and optimizing this one store doesn't avoid much zero-fill work. Although if it means the zero-fill work ends up a multiple of the vector width even for odd-length strings, maybe that helps, especially if you don't have fast-short-rep `rep stosb`. – Peter Cordes Apr 25 '22 at 00:54
  • 1
    @PeterCordes btw, [sourceware](https://sourceware.org/git/?p=glibc.git;a=tree) has up to date tree. woboq seems to be several release behind. – Noah Apr 25 '22 at 01:13
0

Add #if __BYTE_ORDER__ to judge, like this:

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
    *(long *)str = (long)(((long)'h') |
                          ((long)'e' << 8) |
                          ((long)'l' << 16) |
                          ((long)'l' << 24) |
                          ((long)'o' << 32) |
                          ((long)0 << 40));
#else
    *(long *)str = (long)((0 |
                           ((long)'o' << 8) |
                           ((long)'l' << 16) |
                           ((long)'l' << 24) |
                           ((long)'e' << 32) |
                           ((long)'h' << 40));
#endif
dulngix
  • 424
  • 1
  • 5
  • This works for me (I'm using Apple clang), Thanks! On doing some more research, this might not work on all compilers, for anyone in the future referencing this answer. – ramkaz99 Apr 24 '22 at 01:16
  • 4
    `long` is certainly not 64 bits on all platforms. It could be as little as 32 bits. You should probably use `int64_t`, or even better `uint64_t`. Also, in a big-endian machine with 64-bit longs, the address of the `long` corresponds to a letter which has been left-shifted by 56, not 40. – rici Apr 24 '22 at 02:16
  • @ramkaz99: Indeed, `*(long*)str` is also a strict-aliasing violation, so it's undefined behaviour unless you compile with gcc/clang `-fno-strict-aliasing`. Just use `memcpy` to do the C equivalent of an 8-byte load in assembly language, alignment and strict-aliasing safe. – Peter Cordes Apr 24 '22 at 05:21