3

Do ARMv7 and/or ARMv8 instruction set provide an instruction similar to PEXT on x86?

If not, what is the most efficient instructions combination to achieve the same behavior?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Raoul722
  • 1,222
  • 13
  • 30
  • 3
    There is no such instruction and unfortunately `pext` is rather difficult to emulate in the general case. If you have a specific use case in mind, I might be able to suggest an efficient work around. – fuz Nov 18 '21 at 17:09
  • 1
    Unfortunately [Anderson's "Bit Twiddling Hacks"](https://graphics.stanford.edu/~seander/bithacks.html) doesn't include this operation :-( – Nate Eldredge Nov 18 '21 at 17:52
  • I suppose one could look at how a translator like Rosetta does it. – Nate Eldredge Nov 18 '21 at 18:10
  • @NateEldredge Rosetta does not support this instruction as far as I know. – fuz Nov 18 '21 at 18:42
  • `BFC`/`BFI` and `UBFX` come to mind, however not nearly as powerful as `PEXT`, you would need multiple of those and possibly some and-masking to emulate a `PEXT` with a *given known mask*. – Marco Bonelli Nov 18 '21 at 18:43
  • Rosetta does indeed crash on `pext` instructions with SIGILL. But if you have any C compiler that emits the instruction for a particular input code (without explicitly using intrinsics), you could try and see what it generates for ARM? – Siguza Nov 18 '21 at 19:00
  • 1
    There's a possible algorithm at https://stackoverflow.com/questions/21144237/standard-c11-code-equivalent-to-the-pext-haswell-instruction-and-likely-to-be – Nate Eldredge Nov 19 '21 at 00:48
  • 1
    @Siguza: From a quick grep of gcc and clang source code, it's not clear to me that either compiler is capable of emitting `pext` under any circumstances, except from calling an intrinsic. – Nate Eldredge Nov 19 '21 at 00:55

1 Answers1

0

This is my implementation

    // w14 = pext(val: w13, mask: w15)
    // Used Regs x12-x15
    // (w13 & w15 will BROKEN after the loop)
    mov     w14, wzr;
.loop:
    cbz     w15, .pext_end;   // may accelerate by prepare with (w13 & w15)
    clz     w12, w15
    lsl     w13, w13, w12
    lsl     w15, w15, w12
    extr    w14, w14, w13, #31
    bfc     w15, #31, #1
    b      .loop
.pext_end:
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Chen
  • 1
  • 1