0

As a (beginner) exercise, I am trying my hand at implementing something like a class in assembly. For this, I would like to create a BitArray of size N and have the operations of init(size), set(idx), unset(idx), get(idx), and print().

I am starting with the init(), and here is what I have thus far:

bit_array_init:
    # all this has to do is clear N bits (%rdi) starting at address (%rsi)
    mov $-1, %rax   # mask of all 1's
    mov %dil, %cl
    shl %cl, %rax       # shift left by the number of bits, filling right-side with 0's
    andq %rax, (%rsi)   # because the mask is all 1's above the bit count, it doesn't matter
    ret                 # if we 'extend too far' since &1 will have no affect

And it can be called like:

movl $0b110110110110110110, -4(%rbp) # add bogus values to stack to confirm function's ok
    
# 1. Init the bitArray, b = BitArray(BIT_SIZE, address); allocated locally on the stack
mov $12,      %edi # size of the bitarray
lea -4(%rbp), %rsi # where the bitarray starts at
call bit_array_init

It will intentionally only allocate 12-bits instead of something like "2 bytes" and leave everything else as-is (again it's for practice/learning and not actually for real-world stuff).

I have a few questions about this:

  • Does the above look like an acceptable way to init a bit array? Or are there better ways to do this?
  • I'm passing in the memory address of something on the stack. is this the way that most init functions would work? Or should I be passing in (1) something else; or (2) the init will pass back the memory address it allocated to?
David542
  • 104,438
  • 178
  • 489
  • 842
  • What if someone wants 100 bits? – Erik Eidt Oct 03 '20 at 23:23
  • @ErikEidt it won't work. I suppose for this it has a limit of 8 bytes (in an answer perhaps mention how this could be improved to support more than 8 bytes -- I wasn't sure how to do that). – David542 Oct 03 '20 at 23:24
  • 1
    Ok, in that case this will generally work, but it may do an unaligned access of 64 bits to initialize 12 bits. Also, there is some possibility that the memory 7 bytes past the given address doesn't exist, e.g. is on another page not yet allocated to the process. – Erik Eidt Oct 03 '20 at 23:27
  • 2
    If it's supposed to mimic a class, then the caller should not be aware of the internal structure or the size requirements. What if you optimized it in some way so it needs more storage than just the bits? Maybe you extend it to maintain a parity value for quick return. Those are internal private members of your class. The `init` should allocate the whole thing. Also, you are still writing outside of the specified size and that may crash under unlucky circumstances (if you cross a page boundary into inaccessible region). – Jester Oct 03 '20 at 23:27
  • @ErikEidt for `"there is some possibility..."` comment, do you mean with the instruction `andq %rax, %(rsi)` ? – David542 Oct 03 '20 at 23:29
  • @Jester I see, that's a good point. How could an init allocate the whole thing though, what do you mean by that ( would you be interested in elaborating in an answer?) – David542 Oct 03 '20 at 23:32
  • 1
    @David542, yes. How much memory should be allocated for a 12 bit vector? Who is allocating the memory: the class via constructor or the class-consuming client? – Erik Eidt Oct 04 '20 at 00:14
  • @ErikEidt hm, I'm not sure to be honest (I'm a beginner to asm), for memory allocation I would think either 2 bytes or 12 bits if we're going to allow 'contiguous' bitarrays in memory, but come to think of it I don't think that'd really be possible (unless everything was offset from the first memory address), so I suppose 'rounding up' is find. What would you suggest here? – David542 Oct 04 '20 at 00:55
  • @ErikEidt regarding class, again I'm not sure. It seems like from Jester's comment it should be done by the constructor/class itself. Is that the 'correct' way to do it? I did it the above way as that seemed to be the simplest approach (and it seems C often uses that approach in some of those old library functions it offers). – David542 Oct 04 '20 at 00:56
  • *it doesn't matter if we 'extend too far' since &1 will have no affect* - incorrect; non-atomic RMW of data outside your object has been a real source of bugs in multi-threaded programs. You must not invent writes to memory you don't "own". [Crash with icc: can the compiler invent writes where none existed in the abstract machine?](https://stackoverflow.com/q/54524947) covers one example, and the concept in general. – Peter Cordes Oct 04 '20 at 02:22
  • @PeterCordes I see. How would I do a mask on a non-full byte then? Or is it not really possible to do things on partial bytes in asm? – David542 Oct 04 '20 at 02:26
  • If you're writing any bits in a byte, it's your caller's fault if they call this function when another thread might be writing a different set of bits in the same byte. I was talking about whole bytes that didn't need to be touched. Like if you want to init a 12-bit bitfield, you should only touch the 2 bytes that it contains. (Or use `lock andq` because *atomic* RMW is invisible as far as correctness, but that's quite slow for performance; slower than branching). Bytes are the unit / granularity for atomicity in byte-addressable ISAs like x86. (That's why all C objects are a whole # of bytes) – Peter Cordes Oct 04 '20 at 02:30
  • A safe but slower way to init the array to zero would be to zero EAX, then `rep stosb` (i.e. memset) with RCX=ceil(bits/8) to zero every byte that contains any bits. (e.g. `(bits+7) >> 3` with lea / shr). IceLake apparently has a "fast short-rep" feature that might make this perform ok for small objects. Or if you choose to always make fixed-size bitsets like C++ `std::bitset<64>` then it's just `movq $0, (%rdi)`. No need to read the old value, just store zero. But a `rep stosb` based initialization of course makes it easy to support arbitrary-length bitsets. – Peter Cordes Oct 04 '20 at 06:00
  • get/set/flip can of course be memory-destination `bt`/`bts`/`btc`, or emulate the memory destination by doing an address calculation to find the right dword or byte to load into a register, then use `bt* reg,reg` and store. (microcoded memory-dest `bt*` instructions are so slow on existing CPUs that it's actually more efficient to manually shift the bit-offset to find the containing byte or dword). – Peter Cordes Oct 04 '20 at 06:03

1 Answers1

2

You have an initializer but not a constructor.

The difference is that a constructor would allocate the memory, then (perhaps inlined) use the initializer.

If you use a constructor, it would allocate the memory, but how much?  Always 64 bits would make it safe from the problems discussed above (writing to memory you don't necessarily own) — but would also take 64 bits to store any vector even if only 12 bits.

Further, if the constructor allocates the 64 bits, there'd be no need to and to clear those newly allocated bits, one would simple store a (quad word) zero into it.


Various more complicated techniques can be use to create variable length bit vectors of sufficient size as desired by the consuming client, but that would merit a new question and answers.

Erik Eidt
  • 23,049
  • 2
  • 29
  • 53