5

I'm trying to learn x86-64's new AVX-512 instructions, but neither of my computers have support for them. I tried using various disassemblers (from Visual Studio to online ones: 1, 2) to see the instructions for specific opcode encodings, but I'm getting somewhat conflicting results. Plus, it would've been nice to run some instructions and see their actual output.

So I'm wondering if there is an online service that allows to compile small (x86-64) assembly code and run it, or step through it, on a specific processor? (Say, Intel's Sandy Bridge, Cannon Lake, etc.)

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
MikeF
  • 1,021
  • 9
  • 29

2 Answers2

11

Use Intel® Software Development Emulator, aka SDE to run an executable on an emulated CPU that supports future instruction-sets. It's freeware (not open source, but a free download), and is available for Linux, Windows, and I think also OS X.

https://software.intel.com/en-us/articles/debugging-applications-with-intel-sde has step-by-step instructions for how to debug with it on Windows or Linux: SDE can work as a GDB remote, so you can run sde -debug -- ./your-program, then in another terminal run gdb ./your-program and use target remote :portnumber to connect to the SDE process so you can set breakpoints and single-step.


You might be able to do the same thing with QEMU, if they've added support for emulating AVX512. QEMU can also act as a GDB remote.

QEMU definitely has configurable instruction-set stuff, e.g. you could tell it to emulate an x86 with AVX but not AVX2 (like Sandybridge.) SDM can probably do the same thing.

You could even tell it to emulate something you won't find on real hardware, like AVX2 but not BMI1/2, if you want to verify that your CPUID checks don't assume anything implies anything else that isn't guaranteed.


Remember that these are both essentially useless for performance testing, only for correctness of your vectorization. IACA could be useful to get an idea of performance on SKX, but it's far from perfect and doesn't model memory bottlenecks at all. (Only the actual pipeline in some level of detail.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Yeah, I thought about an emulator too. I may try it. Although it's quite limiting. Stepping through code with a debugger would be my optimal solution. As for other online disassemblers, as my experience shows, most run on processors that don't support AVX512. I need to see if Amazon or Microsoft's Azure has a plan that supports low cost CPU rental. (like Hans Musgrave suggested.) – MikeF Aug 12 '18 at 04:17
  • @MikeF: My answer shows how you can single-step through the emulated code with a debugger. (Or at least links to an Intel article about how to do that on Windows. I only quoted the Linux part, because it's a couple simple commands.) – Peter Cordes Aug 12 '18 at 04:56
  • @MikeF: If you literally just want a *disassembler*, use `objdump -drwC -Mintel` or [Agner Fog's `objconv`](http://www.agner.org/optimize/#objconv) to convert machine code into asm text. **Your CPU doesn't have to support AVX512 for a disassembler to work**, no emulation or anything needed. Or if you're compiling C or C++, use https://godbolt.org/ to get asm output from the compiler directly, without creating an executable and then disassembling it. e.g. https://godbolt.org/g/YsVuAX has some example functions with compiler output from gcc, clang, and MSVC. – Peter Cordes Aug 12 '18 at 05:09
  • Thanks, Peter. And no, I don't need just a disassembler. (I can get them from many sources.) What I wanted is to test run those AVX512 instructions on the actual hardware. I'm currently trying to install a Windows 10 VM in a 30-day free trial Azure account. If that doesn't have a CPU that supports AVX-512, I'll look more closely into your suggested emulator. I appreciate all your suggestions though! – MikeF Aug 12 '18 at 05:14
  • 1
    @MikeF: Are you doing that for performance testing? Your question doesn't say that, so a free emulator you can run on your desktop to single-step AVX512 code seems a lot better to me. – Peter Cordes Aug 12 '18 at 05:22
  • I just want to learn about those new AVX-512 instructions. They added a bunch of new encodings (with EVEX prefix) that is hard to understand just by reading the Intel documentation. So idk, it's been always easy for me to first read the docs and then run some tests. So that's my main goal so far. – MikeF Aug 12 '18 at 05:24
  • 1
    @MikeF: That's exactly what you can do with an emulator, like my answer explains, without having to remote-desktop to a cloud VM to run a debugger there. That's how I learned AVX512. (Actually I spent more time just looking at compiler-generated asm for stuff I tried with intrinsics; I think I only actually ran things in SDE once or twice. Seeing what syntax was accepted by NASM was another way I learned how/when you could use masking and broadcast loads, and rounding-mode overrides.) – Peter Cordes Aug 12 '18 at 05:28
  • Yep, that's exactly what I'm trying to learn. Thanks. Although I'm on Windows. Can I use it with Visual Studio, do you know? – MikeF Aug 12 '18 at 05:31
  • @MikeF: IDK, read the Intel white papers I linked. They have a Windows section. I assume so, Intel typically cares about Windows at least as much as Linux. But I don't use Windows so I didn't read that part. – Peter Cordes Aug 12 '18 at 05:32
3

There are online tools which allow you to at least select different assembly dialects, but I'm not seeing anything that supports Xeon Phi or Skylake. However, the Intel C++ and Fortran compilers support cross-compiling for those additional architectures. It seems you're using Windows, and that is directly supported.

An additional route would include renting an AWS EC2 C5 instance to play with which natively supports AVX-512. For learning purposes, this can be done for as little as $0.085/hr for a reserved instance or $0.0185/hr if you're fine with Spot pricing.

Hans Musgrave
  • 6,613
  • 1
  • 18
  • 37
  • Hey, thanks. Your AWS idea sounds very interesting. Although I've never deal with them before. Where do you take all these prices from? And also what is "spot pricing"? – MikeF Aug 12 '18 at 04:14
  • Pricing varies over time, but [this link](https://aws.amazon.com/ec2/pricing/) should stay up to date. The "spot" instances differ from the "on-demand" instances in that you don't get a machine instantly allocated necessarily. Amazon uses them to fill the gaps in the normal usage and is willing to offer a discount since something is better than nothing (as long as that something exceeds their operating overhead). Your testing likely doesn't require lots of resources or persistent storage between instances on their machines, so the cheapest option should work fine. – Hans Musgrave Aug 12 '18 at 04:18
  • Examining your comment on the other answer, AWS **is** Amazon, and Azure has a comparable product with AVX-512. Their [pricing](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/windows/) is competitive -- not outdoing the spot instances but handily beating AWS on-demand products. – Hans Musgrave Aug 12 '18 at 04:19
  • Yep, thanks. I'll try to dig through it. So far it's all very confusing. Let me try to get it straight. I'd rent a VM that I can install, say, Windows on and then remote into it, right? If so, it would be a good idea, as I can run a remote debugger on it with Visual Studio. What confuses me is their naming in that list you linked. Say `t1.micro`, `t2.small`, and so on -- million things on that list. Also how do I select which CPU it will run on? – MikeF Aug 12 '18 at 04:23
  • Those clouds services are IMO needlessly complex. You'd rent a VM and be able to choose what kind of VM it is (e.g. Windows). You don't have to install the OS. You'd need to dig into the docs to verify the CPU type, or you can take my word for it that Amazon is bragging about AVX512 in the C5 instances and that Microsoft is bragging about it in their Fv2 instances. Both providers use Skylake processors which have the newer version of the AVX512 instruction set. To select which kind of, for example, C5 instance you want you'd need to compare their other properties like RAM. Cheapest should work – Hans Musgrave Aug 12 '18 at 04:31
  • They support so many services that documentation takes a little while to wade through till you get used to it. It's to the point where being knowledgeable in AWS is an actual employable skill. – Hans Musgrave Aug 12 '18 at 04:33
  • You bet! Hey, I just noticed that Azure supports free account for 30 days. That may be all I need. Do you think it's worth trying to sign up for that? Or do they run those free accounts on some under-powered CPUs? – MikeF Aug 12 '18 at 04:35
  • Usually free accounts are limited to a form of "micro" instance, with the exact terminology varying between cloud providers. Those will typically have enough hours you could run it constantly all month for free. You only pay for your usage anyway (no monthly fees) though, so for debugging and playing with the AVX-512 instruction set you'll probably come in at under a dollar, especially if you're familiar with other SIMD instruction sets. – Hans Musgrave Aug 12 '18 at 04:38
  • Hans, I just finished setting up Azure 30-day free account. Here's what I found. Their standard run-of-the-mill VM with a client version of Win10 was installed on the `Intel(R) Xeon(R) CPU E5-2673 v3` which is a Haswell CPU that doesn't support AVX512. So I had to go with their `F2s_v2` Standard `Compute optimized` plan and Win10 Datacenter Server OS that was installed on `Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz` that [supported AVX512F (bit 16) and AVX512VL (bit 31) but did not support AVX512_IFMA (bit 21)](https://i.imgur.com/uq2tSdu.png). I could then remote into it with VS debugger. – MikeF Aug 12 '18 at 06:57
  • I'm not sure whether or not they'll let me use it for free for the next 30 days, but aside from having taken several hours to set up this is a way to run my tests on an actual (albeit VM'ed) hardware. Now I'll try reading Intel emulator's white paper that Peter Cordes suggested in another post. Maybe it's an easier solution. – MikeF Aug 12 '18 at 06:59