5

The Peak GFLOPS of the the cores for the Desktop i7-4770k @ 4GHz is 4GHz * 8 (AVX) * (4 FMA) * 4 cores = 512 GFLOPS. But the latest Intel IGP (Iris Pro 5100/5200) has a peak of over 800 GFLOPS. Some algorithms will therefore run even faster on the IGP. Combining the cores with the IGP together would even be better. Additionally, the IGP keeps eating up more silicon. The Iris Pro 5100 takes up over 30% of the silicon now. It seems clear which direction Intel desktop processors are headed.

As far as I have seen the Intel IGP, however, is mostly ignored by programmers with the exception of OpenCL/OpenGL. I'm curious to know how one can program the Intel HD Graphics hardware for compute (e.g. SGEMM) without OpenCL?

Added comment: Their is no Intel support for HD graphics and OpenCL on Linux. I found beignet which is open source attempt to add support to Linux at least for Ivy Bridge HD graphics. I have not tried it. Probably the people developing Beignet know how to program the HD graphics hardware without OpenCL then.

Community
  • 1
  • 1
Z boson
  • 32,619
  • 11
  • 123
  • 226
  • Note: it's [GFLOPS](https://en.wikipedia.org/wiki/FLOPS), not [GFLOPs/s](https://en.wikipedia.org/wiki/FLOPS). Also why are you multiplying `8 (AVX) * (4 FMA)` ? – Paul R Aug 20 '13 at 13:01
  • I changed to GLOPS. FMA does a multiplication and addition simultaneously, that gives one factor of 2, Haswell can to two FMA instructions simultaneous that gives another factor of two. Each FMA can do one AVX instruction that gives another factor of 8 (single floating point). – Z boson Aug 20 '13 at 13:37
  • GLSL programming? DirectCompute? PTX? – huseyin tugrul buyukisik Aug 20 '13 at 15:14

3 Answers3

4

Keep in mind that there is a performance hit to copy the data to the video card and back, so this must be taken into account. AMD is close to releasing APU chips that have unified memory for the CPU and GPU on the same die, which will go a long way towards alleviating this problem.

The way the GPU used to be utilized before CUDA and OpenCL were to represent the memory to be operated on as a texture utilizing DirectX or OpenGL. Thank goodness we don't have to do that anymore!

AMD is really pushing the APU / OpenCL model, so more programs should take advantage of the GPU via OpenCL - if the performance trade off is there. Currently, GPU computing is a bit of a niche market relegated to high performance computing or number crunching that just isn't needed for web browsing and word processing.

Austin
  • 1,018
  • 9
  • 20
  • Ages ago I programmed the Amiga hardware, blitter and so forth, with 680x0 assembly (before I switched from CS to Physics). The hardware beyond the CPU was programmed through memory mapped registers. Shouldn't the Intel IGP have something similar? OpenCL goes through the video driver. I feel like I should be able to write directly to the hardware and skip the middleman. – Z boson Aug 20 '13 at 17:21
  • 2
    Why re-invent the wheel? How is OpenCL not meeting your needs? If you are looking for an embedded solution with high performance, perhaps go with AMD mobos with Radeon GPUs built into the board? – Austin Aug 20 '13 at 18:45
  • OpenCL relies on a vendor's device driver. Imagine if you had to wait for Intel to put out a device driver (for each OS) for you to program the x86. Nobody would put up with that! The OpenCL driver for HD 4000 did not have support for Linux last time I checked. Maybe it does now. The vendor could stop supporting a device driver as well (Nvidia has put a minimal amount of errort into their OpenCL support for a long time now). Since the IGP seems to the future for desktop processors it should be possible to program them in C++ just like using something like intrinsics to do SIMD on the x86. – Z boson Aug 21 '13 at 09:43
  • I guess OpenCL is the only realistic option for now. – Z boson Aug 22 '13 at 07:59
  • Yeah, by the time you figure out all the registers and optimize it, Intel will either release OpenCL support or new hardware. If you really want to optimize your OpenCL code, look into using Vectors like float2, float3, float4, etc. These are supposed to use SIMD. HOWEVER, vector support differes from device to device (NVIDIA stinks right now) and you can overly optimize your kernel for a particular device if not careful. – Austin Aug 23 '13 at 15:46
  • I already use OpenMP and intrinsics in my x86 code so I get everything OpenCL has (almost) without needing OpenCL. Intel's OpenCL drivers give you SVML (SIMD math library) for free so that's the only advantage OpenCL offers for x86 (you have to buy MKL or the ICC compiler otherwise). – Z boson Aug 23 '13 at 17:53
  • If you are sticking with x86 Intel, I would highly recommend testing out their Intel Performance Primitives. The license for a single seat is cheap and the libraries are easily re-distributed. The performance is top tier and works well with OpenMP. I think it is free to use for academic use, but this was a few years ago. – Austin Aug 23 '13 at 18:11
4

It doesn't make sense any more for vendors to let you program using low-level ISA.

  1. It's very hard and most programmers won't use it.
  2. It keeps them from adjusting the ISA in future revisions.

So programmers use a language (like C99 in OpenCL) and the runtime does ISA-specific optimizations right on the user's machine.

An example of what this enables: AMD switched from VLIW vector machines to scalar machines and existing kernels still ran (most ran faster). You couldn't do this if you wrote ISA directly.

Dithermaster
  • 6,223
  • 1
  • 12
  • 20
  • Yeah, I thought about that. But that's not the case with x86 cores. This means most people are wasting a large fraction of their silicon (and computation potential) especially if they are using a discrete GPU. But Intel has forced that on desktop users. I would rather have had more x86 cores. The only option is OpenCL and its drivers. I don't think Intel's OpenCL HD drivers work on Linux. I have not tired [beignet](http://cgit.freedesktop.org/beignet/) yet. Additionally, Intel's latest OpenCL SDK only runs on [windows](http://software.intel.com/en-us/vcsource/tools/opencl-sdk). – Z boson Aug 23 '13 at 06:46
  • Again, I'd recommend some cheap AMD hardware with a decent integrated GPU if you are looking for a small, Linux based solution. AMD seems to be the best at supporting Linux at this point. – Austin Aug 23 '13 at 15:50
  • Is your statement that AMD switched to scalar machines correct? I think you mean AMD switched from VLIW to superscalar machines. – Z boson May 16 '16 at 08:08
  • News from the distant future: Intel's OpenCL SDK is available on Linux: https://software.intel.com/en-us/opencl-sdk/choose-download – Tomislav Nakic-Alfirevic Sep 15 '19 at 19:23
1

Programming a coprocessor like iris without opencl is rather like driving a car without the steering wheel.

OpenCL is designed to expose the requisite parallelism that iris needs to achieve its theoretical performance. You cant just spawn 100s of threads or processes on it and expect performance. Having blocks of threads doing the same thing, at the same time, on similar memory addresses, is the whole crux of the matter.

Maybe you can think of a better paradigm than opencl for achieving that goal; but until you do, I suggest you try learning some opencl. If you are into python; pyopencl is a great place to start.

Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42