3

I'll introduce my question using a concrete example. The actual question is at the bottom of this text.

Introduction

I'd like to extract some unaligned data from byte arrays where given are

  • the start bit position
  • the number of bits
  • whether the data is MSB-first (big endian) or LSB-first (little endian) format

For that I created a Bitpacker class that offers a static method

  ulong ReadRaw(byte[] src, int startBit, int bitLength, Endianness endianness = Endianness.LSB_FIRST)

This method of course has to do some computation to get the desired bits out of the bytes using loops etc. which is slow. I need to evaluate the data on the order of multiple thousand times per second.

Because the arguments are compile-time known constants I could hard-code a fast variant by manually finding what bits to extract and how to shift them.

E.g. the following two assigments to raw do the same:

ulong raw;

// Manually extract bits
raw = ((ulong)(src[5] & 0xFC) >> 2) + ((ulong)(src[6] & 0x3) << 6);

// Use the slow generic implementation
raw = Bitpacker.ReadRaw(src,42,8, Endianness.LSB_FIRST);

Hard-coding of course makes it difficult to write correct code and significantly decreases code-maintainability. That's where I thought that source-generators could come into play.

The actual question

Is it possible to use the source generator feature to somehow generate different code for each call to ReadRaw based on the constant arguments or is it possible to replace the calls to ReadRaw altogether using source generators?

sonntam
  • 358
  • 1
  • 2
  • 14
  • 1
    Source generators cannot currently *replace* anything; they can only *add*; it can't change the call-site; you *could* generate `ReadRaw` entirely on-the-fly (partial method, etc) with a `switch` that special-cases every scenario you detect at build (plus a fallback version for anything non-constant), but if you have lots of call paths, I wonder if that would make things worse rather than better – Marc Gravell Jan 05 '23 at 09:17
  • you *could* optimize for common byte-lengths via separate methods; for example, if 8 is common, you could have a `ReadByte(byte[] source, int startBit)` and have a big `switch` on `startBit`, with branches for the 8 possible inter-byte positions (`startBit % 8`)? and a `ReadUInt16LittleEndian`, etc – Marc Gravell Jan 05 '23 at 09:27
  • A good JIT compiler can theoretically specialize a function call for a given constant (as long as there are not too many of them). What you want looks like C++ templates. If the parameters tends not to change, then you can write a function that compute the offsets, the masks and the shifts. Doing variable-based shifts/offsets is not much slower than constant-based ones and masking is equally fast in this case on most CPU architecture. This also remove the need to dynamically check for the endianess. Generating code should be the last option (it is generally really bad for maintenance). – Jérôme Richard Jan 06 '23 at 02:31
  • @MarcGravell That's what I suspected. Thinking about it, being able to change code using source generators would be one big security risk. I think I'll use e.g. ReadRaw_42_8_L function decorators that get code-generated by the source-generator instead. This is ugly as it codes the constant parameters into the function decorator, but it works and is reasonably maintainable in my case (there is much more trickery going on in C/C++ with some of the macros - so there is that). – sonntam Jan 09 '23 at 08:19

0 Answers0