0

As far as I can tell the VEXTRACTF128 and VEXTRACTI128 instructions do the same things, have the same latency, same throughput, and use the same ports. The only difference I cant tell between them is that VEXTRACTF128 only requires AVX VEXTRACTI128 requires AVX2. If that's the only effective difference why use VEXTRACTI128?

I saw the following in Agner Fog's vectroclass which I infer means there is some important difference between the instructions. Maybe they share different domains (floating point or integer)?

#if defined (_MSC_VER) && _MSC_VER <= 1700 && ! defined(__INTEL_COMPILER)
    __m128i sum5  = _mm256_extractf128_si256(sum4,1);                // bug in MS VS 11
#else
    __m128i sum5  = _mm256_extracti128_si256(sum4,1);                // get high sum
#endif
Z boson
  • 32,619
  • 11
  • 123
  • 226
  • @PaulR, this is a duplicate. Maybe I should delete the question? But I'm not sure it was answered best. You say to use VEXTRACTI128 with integers but don't explain why it's any better. – Z boson Sep 05 '14 at 11:10
  • True - I don't have any more info on this - you could maybe make this question more specific, so that it's no longer a duplicate, or maybe delete this one and add something to the older question to make it active again. – Paul R Sep 05 '14 at 11:12
  • @PaulR, can you suggest something to add to my question? I tried to be more specific. All I got is that another developer chose to switch between the two which I infer means there is some important difference. In practice I would do as you say and use VEXTRACTI128 with integers but on paper I can't explain why it matters. – Z boson Sep 05 '14 at 11:55
  • All I can think of is to maybe implement two simple benchmark loops, one for each instruction, see if there is any performance difference, then update the question to include the benchmark code and results etc. – Paul R Sep 05 '14 at 14:35
  • @PaulR, that's a good suggestion. But it's not important enough for me to spend the time to do it. – Z boson Sep 05 '14 at 14:38
  • OK - no problem - I guess it will have to remain a mystery for a while longer then. – Paul R Sep 05 '14 at 14:44
  • 1
    I've added a bounty to the original question now, to see if we can stir up some activity and maybe get a definitive answer. – Paul R Sep 05 '14 at 14:50
  • 1
    @PaulR, cool! But I think I might know the answer now. It's related to my comment "Maybe they share different domains (floating point or integer)?" It's explained in the sections "Data bypass delays on Core2" and "Data bypass delays on Nehalem" in Agner Fog's microarchitecture manual. My question is analogous to asking "What's the difference between `MOVDQA` and `MOVAPS`. I'll wait for others to answer. If I find time to write some code to measure the domain cost using VEXTRACTF128 instead of VEXTRACTI128 for for `__m256i` I'll post an answer. – Z boson Sep 05 '14 at 19:14
  • Great - I thought it might be something like that but the Intel docs, as usual, are sadly lacking in these details. – Paul R Sep 05 '14 at 21:18
  • @PaulR, looks like you already have a good answer to your bounty. Should I delete my question or close it (can I vote to close my own question)? Actually, I just voted to close my own question. Cool. – Z boson Sep 06 '14 at 21:58
  • I think you can just vote to close as duplicate - there are already three such votes, so it will only need one more after yours. – Paul R Sep 06 '14 at 22:01

0 Answers0