0

I have to detect a null byte in a word, So I have to check if 8 of the 16 bits are zero, so basically either the front 8 bits or the back. The problem is I can't use a lot of cycles. So I need a bit mask, that checks the front and the back in just one operation. So first checking bl and then bh doesn't work as it takes to long. The other null byte posts never really address a real 16-bit word. (assembly)

I already tried 0xff and 0x00ff but can't combine them into one bit mask

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
User342
  • 17
  • 2
    Why are you sure it is possible in x86 with a single instruction? – dimich Jun 03 '23 at 11:18
  • 1
    You can create 65536-byte lookup table and read value indexed by `bx`, but it is very inefficient. – dimich Jun 03 '23 at 11:21
  • I already did it for doubleword and quadword with bit masks, but just can't figure out a bitmaskt for word. – User342 Jun 03 '23 at 12:08
  • 1
    The method with bit masks can probably be "narrowed" to make it suitable for words. Which method did you use? – harold Jun 03 '23 at 13:29
  • 1
    @harold With a single instruction? Please show us how. – dimich Jun 03 '23 at 13:47
  • @dimich after OP shows me how he did it for a dword/qword – harold Jun 03 '23 at 13:58
  • 1
    @harold Sorry, i mentioned you mistakenly. The question is for OP. – dimich Jun 03 '23 at 14:34
  • 2
    Please specify the *instruction set architecture* explicitly. – greybeard Jun 03 '23 at 16:30
  • 1
    The 4-operation bithack shown in https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord (`#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)`) works just fine for 16-bit or 64-bit inputs, it's just usually not worth it vs. two separate macro-fuse `test/jz` uops. Comments on [How to check if a register contains a zero byte without SIMD instructions](https://stackoverflow.com/posts/comments/134703423) discuss the 64-bit case. – Peter Cordes Jun 03 '23 at 18:14

1 Answers1

2

A mask isn't needed, but it takes two instructions:

        mul     ah      ;multiply al by ah
        test    ax,ax   ;set zero flag if ax = 0
rcgldr
  • 27,407
  • 3
  • 36
  • 61
  • 1
    8-bit `mul` is fast on a modern x86 (https://uops.info/), but on older CPUs it would be faster to run more instructions (for example on P5 Pentium, `mul ah` takes 11 cycles: https://www2.math.uni-wuppertal.de/~fpf/Uebungen/GdR-SS02/opcode_i.html). (OP hasn't said what they're optimizing for). – Peter Cordes Jun 03 '23 at 18:09
  • Use `test ax,ax` instead of `or`, especially if the next instruction is a `jcc`: `test` can macro-fuse, `or` can't, on current CPUs. [Test whether a register is zero with CMP reg,0 vs OR reg,reg?](https://stackoverflow.com/q/33721204) – Peter Cordes Jun 03 '23 at 18:09
  • @PeterCordes - I updated my answer. As noted, OP doesn't specify what CPU. – rcgldr Jun 04 '23 at 04:21
  • Right, I forgot to add that `test ax,ax` isn't worse anywhere (except in odd cases on P6-family where rewriting a register with the same value might reduce register-read stalls if the value is read again later after this.). That's what makes it a bad idea to ever use `or` in cases like this where we don't know which CPUs they care about. – Peter Cordes Jun 04 '23 at 05:08