3

Is there any integer 2-operand x86-64 instruction that uses its first operation only as a destination and not as a source + destination1 or source only2, and which run on p0156 on Intel Haswell and/or later CPUs?

Not interested in mov instructions, i.e., anything with mov in the name.

For example, BMI1 blsi eax, edx is 2-operand with a write-only destination but can only execute on port 1 or port 5 on Skylake.


1 Most instructions fall into this category, e.g., e.g., add eax, ebx represents eax = eax + ebx.

2 A handful of 2-operand integer instructions use their first operand only as a source, e.g., cmp eax, ebx.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • 1
    `lds` and friends? Is something like a mov but doesn't have it in the name :D Obviously `lea` too. No idea about `p0156` though. – Jester Aug 19 '19 at 18:41
  • @Jester - the problem is `lea` only runs on 2 ports. Hoping for something that runs on `p0156`. – BeeOnRope Aug 19 '19 at 18:44
  • Would be easier if the instruction set reference said that but it doesn't. :/ – Jester Aug 19 '19 at 18:46
  • 1
    Also, which cpu? For haswell, Agner Fog's table shows `lea r16, m` as `p1 p0156`, whatever that means. – Jester Aug 19 '19 at 18:48
  • `in`, `lar` and `lsl` also qualify I guess, still no idea about ports. – Jester Aug 19 '19 at 18:55
  • 1
    `blsi` (etc) are also p15.. is there some deep reason for this pattern? – harold Aug 19 '19 at 19:33
  • @Jester: That means the actual LEA runs on port 1 (complex LEA, apparently can't use the simple-LEA unit on port 5). And that the merging uop (into the low 16 bits of the destination reg) runs on any port. But that uop is a RMW into the 64-bit destination, not what @ Bee asked for. (Intel Haswell and later don't rename partial registers separately from the whole reg, except AH/BH/...) Some other 16-bit instructions with an architecturally-write-only destination also have an extra merging uop, e.g. `imul r16, r/m16, imm` – Peter Cordes Aug 20 '19 at 03:32
  • Are you looking for instructions for which all uops can run on p0156, or for which at least one uop can run on p0156? – Andreas Abel Aug 21 '19 at 13:42
  • @AndreasAbel - all, in this case. Don't spend time on it unless it is to satisfy your personal curiosity, it's not particularly important and Peter's answer is great. – BeeOnRope Aug 21 '19 at 15:38

2 Answers2

6

The following Python script searches for such instructions in the uops.info XML file (https://uops.info/xml.html):

#!/usr/bin/python
import xml.etree.ElementTree as ET
import re

def main():
   for XMLInstr in ET.parse('instructions.xml').iter('instruction'):
      if len(XMLInstr.findall("./operand[@type='reg']")) != 2:
         continue
      if not any(True for op in XMLInstr.findall("./operand[@type='reg']") if op.attrib.get('w', '0') == '1' and op.attrib.get('r', '0') == '0'):
         continue
      if any(re.search("\A\d*\*p0156\Z", m.attrib.get('ports', '')) for m in XMLInstr.findall("./architecture/measurement")):
         print XMLInstr.attrib['string']

if __name__ == "__main__":
    main()

If we exclude from the results all instructions with MOV in the name, the only instructions that remain are CBW, CWDE, and CDQE. However, these instructions have only implicit operands, which is probably not what you are looking for.

Andreas Abel
  • 1,376
  • 1
  • 10
  • 21
  • Moreover, those sign-extend-within-RAX instructions all have the *same* register as both source and destination, unlike what the question asked for. (Presumably `cwde` decodes to the same uop as `movsx eax, ax`.) No Intel CPUs with a port-6 ALU (HSW and later) do partial-register renaming of low-8 or low-16 registers ([SnB or IvB and earlier](https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers) IvB testing needed) so you can't even argue that a separately-renamed AL or AX counts as a separate register. – Peter Cordes Aug 21 '19 at 18:13
  • @PeterCordes Even though they use the same register, https://github.com/intelxed/xed/blob/master/datafiles/xed-isa.txt considers the instruction to have two operands (a write-only destination and a read-only source); the uops.info file is based on this specification. – Andreas Abel Aug 21 '19 at 20:09
  • Sure, it's not surprising that AL and AX count as different registers, I'm just pointing out that we have an even better reason for ruling them out. If `cdq` (edx = signbit of eax) ran on any port, it would definitely be worth mentioning even though both operands are implicit. (But it runs on the scalar shift ports, p06) – Peter Cordes Aug 21 '19 at 20:28
4

I tried searching for 0156 in Agner Fog's table. Some instructions aren't exactly what you asked for, but seem worth mentioning.


I know you wanted to exclude mov type instructions, but movsx r32, r16/r8 is definitely not eliminated, and definitely runs on any of the p0156 integer ALU ports. Similarly movsxd r64, r32. Only mov r32,r32, mov r64, r64, and movzx r32, r8 can be eliminated (0 latency, no unfused-domain uop).

If you were ruling out movzx/sx because of possible mov-elimination, look again at movsx. It may be the only such instruction.


bextr r,r,r is 2p0156. But it's probably actually p06 + p15 or something, implementing it with something like shift (p06) + BZHI (p15) uops. That hypothesis can be tested by mixing it with some shifts or p15 instructions.

xchg r64, r64 is 3 uops for p0156. According to my reverse-engineering, I think each uop is a reg-reg mov that's not subject to mov-elimination, and actually needs an ALU port. One of the registers involved is an internal microcode-use-only register that's not architecturally visible, but does participate in register renaming. (I think we have other evidence that there are a few extra logical registers that don't have an x86 name, e.g. using up PRF entries). But of course neither destination of the whole x86 instruction is write-only. leave also has 2p0156 (possibly not using the stack engine).

salc is 3p0156 (set AL from carry: undocumented, not 64-bit mode) but that's probably sbb same,same and a merging uop into RAX. So it's probably like lea r16, [m] or imul r16, r/m16, imm or movsx r16, m8 that also have a merging uop into an architecturally write-only destination.

movbe r64, m64 runs on 2p0156 p23 on SKL. But movbe r32, m32 runs on p15 p23 so there's probably just one extra p0156 uop in there, or a p06 uop. bswap r64 is p15 p06 so we can be pretty sure that's what movbe uses. I assume movbe r64, m64 is really p15 p06 p23, i.e. load + bswap, but Agner didn't manage to pick that apart.

So other than movsx and movzx dst, r16, mostly this answer is debunking / ruling out possible p0156 instructions from Agner Fog's table.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    > That hypothesis can be tested by mixing it with some shifts or p15 instructions.                             Indeed: https://uops.info/html-ports/SKL/BEXTR_R64_R64_R64-Measurements.html – Andreas Abel Aug 21 '19 at 14:01
  • @AndreasAbel: oh right, should have just looked at uops.info instead of guessing about missing details in Agner's results >. – Peter Cordes Aug 21 '19 at 14:39