1

I'm looking for the fastest way to parse a hex string representing a ulong into a uint keeping as many leading digits as a uint can handle and discarding the rest. For example,

string hex = "0xab54a9a1df8a0edb"; // 12345678991234567899 Should output: uint result = 1234567899;

I can do this by simply parsing the hex into a ulong, getting the digits using ToString and then just taking as many of them as would fit into uint without overflowing but I need something much faster. Thanks. C# code preferred but any would do.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Vas
  • 747
  • 1
  • 12
  • 18
  • 1
    If your question is about C# only tag C#. And do you have a proven performance problem here? E.g. did you profile your code and made sure this conversion is actually holding you back? – Pepijn Kramer Jun 01 '23 at 04:28
  • 1
    16 is a power of 2, so luckily you can just take the last 8 hex digits if there are more than 8. [Is there an algorithm to convert massive hex string to bytes stream QUICKLY? asm/C/C++](//stackoverflow.com/a/67169220) has a C++ version using SIMD intrinsics. https://github.com/zbjornson/fast-hex also exists, but could use some improvements in how it horizontally packs nibbles. If you already have a decently fast string to `ulong`, simply truncate that `ulong` to 32-bit, like C `x & 0xFFFFFFFFuLL` or `(uint32_t)x`; narrowing integer conversions from in C modulo-reduce to fit the dst type – Peter Cordes Jun 01 '23 at 04:40
  • 1
    x86 can extract the low nibbles of 8 bytes into 4 bytes with BMI2 `pext`, one asm instruction, after some SIMD work to map each byte to one whose low 4 bits are a 0-15 integer. (`movq rax, xmm0` / `pext rax, rax, rdx` or something after some SIMD work on up to 16 bytes from XMM0.) It's slow on AMD before Zen 3, but otherwise single uop with one per clock throughput and e.g. 3 cycles of latency on Intel. https://uops.info/. – Peter Cordes Jun 01 '23 at 04:46
  • 1
    Or wait, do you want to truncate *decimal* digits, not hex digits?? Because `0xab54a9a1df8a0edb` truncated normally to uint32_t is `0xdf8a0edb` = `3750366939`, with the low hex digits matching, not the low decimal digits. – Peter Cordes Jun 01 '23 at 04:48
  • @PeterCordes Yes, the decimal ones. Thanks. – Vas Jun 01 '23 at 04:50
  • 1
    So like `u64 % 10000000000` (1e10) or something might be a starting point. If that's greater than 2^32-1, take modulo 1000000000 (1e9) instead to take the low 9 decimal digits? – Peter Cordes Jun 01 '23 at 04:51
  • Are you doing this for a single string at a time, or do you have an array of strings you can process in parallel with SIMD for the modulo? – Peter Cordes Jun 01 '23 at 04:55
  • @PeterCordes I'm processing a single string at a time and trying to avoid parsing it into a uint64 to begin with but may end up doing that if I don't find an alternative. – Vas Jun 01 '23 at 05:01
  • using the modulus `ulong.parse` with modulus is about twice as fast as parse/chop string/parse again. – Matthew Whited Jun 01 '23 at 05:02

1 Answers1

2

For decimal truncation, all the high bits of the hex digit affect the low 9 or 10 decimal digits, so you need to convert the whole thing. Is there an algorithm to convert massive hex string to bytes stream QUICKLY? asm/C/C++ has C++ with SSE intrinsics. I commented there with some possible improvements to that, and to https://github.com/zbjornson/fast-hex . This could be especially good if you're using SIMD to find numeric literals in larger buffers, so you might have the hex string in a SIMD register already. (Not sure if SIMDJSON does that.)

Hex-string to 64-bit integer is something SIMD certainly can speed up, e.g. do something to map each digit to a 0-15 integer, combine pairs of bytes to pack nibbles (e.g. with x86 pmaddubsw), then shuffle those 8-bit chunks to the bottom of a register. (e.g. packuswb or pshufb). x86 at least has efficient SIMD to GP-integer movq rax, xmm0, although the ARM equivalent is slow on some ARM CPUs.

(Getting a speedup from SIMD for ASCII hex -> uint is much easier if your strings are fixed-length, and probably if you don't need to check for invalid characters that aren't hex digits.)


Decimal truncation of u64 (C# ulong) to fit in u32 (C# uint)

Modulo by a power of 10 truncates to some number of decimal digits.

(uint)(x % 10000000000) works for some numbers, but 10000000000 (1e10 = one followed by 10 zeros) is larger than 2^32-1. Consider an input like 0x2540be3ff (9999999999). We'd get (uint)9999999999 producing 1410065407 = 0x540be3ff (keeping the low 32 bits of that 34-bit number.)

So perhaps try modulo 1e10, but if it's too big for u32 then modulo 1e9.

  ulong tendigit = x % 10000000000;  // 1e10
  uint truncated = tendigit <= (ulong)0xffffffff ? tendigit : (x % 1000000000);  // % 1e9 keeps 9 decimal digits

If this isn't correct C# syntax or the literals need some decoration to make them ulong (like C 10000000000uLL for good measure), please let me know.

It's probably at least as efficient to just modulo the original number two different ways than to try to get the leading decimal digit of x % 1e10 and subtract it or whatever. The asm is going to need two 64-bit multiplicative inverse constants, and starting from the original number again keeps critical-path latency shorter for out-of-order exec if branch prediction predicts that it needs to calculate the nine-digit truncation.


Binary truncation

@Matthew Whited deleted his answer (due to a bug in the decimal truncation part), but his binary truncation part based on substrings of the original hex input could perhaps be more efficient in some cases than doing the full conversion and then casting to a narrower type or masking with AND.

If you want the last 8 bytes of the hex string

uint.Parse(hex[^8..],NumberStyles.HexNumber)

If you want the first 8 bytes

uint.Parse(hex[2..10], NumberStyles.HexNumber);
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847