9

I tried to run the following program in my computer (Fedora 17 32bit). How can I enable my system to support the popcnt instruction for fast population count?

#include <stdio.h>
#include <nmmintrin.h>

int main(void)
{
    int pop = _mm_popcnt_u32(0xf0f0f0f0ULL);
    printf("pop = %d\n", pop);
    return 0;
}

I compiled the program, and run it, but got the following exception:

[xiliu@xiliu tmp]$ gcc -Wall -march=corei7 -m32 -msse4.2 popcnt.c -o popcnt
[xiliu@xiliu tmp]$ ./popcnt 
Illegal instruction (core dumped)

The following is the information of my processor:

[xiliu@xiliu tmp]$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 15
model name  : Intel(R) Pentium(R) Dual  CPU  T2370  @ 1.73GHz
stepping    : 13
microcode   : 0xa4
cpu MHz     : 800.000
cache size  : 1024 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fdiv_bug    : no
hlt_bug     : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 10
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm
bogomips    : 3458.20
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

[... repeated for 2nd core ...]
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
afancy
  • 673
  • 4
  • 10
  • 18
  • This is a pretty terrible example; if you compile with gcc with optimization enabled there won't be a `popcnt` instruction in the binary, because constant-propagation will turn it into `mov esi, 16` (https://godbolt.org/z/h5ObTj). MSVC fails, though, and still emits a popcnt instruction. – Peter Cordes Oct 15 '18 at 03:08

3 Answers3

15

Use __builtin_popcount() instead, It isn't platform specific.

Hasturkun
  • 35,395
  • 6
  • 71
  • 104
12

The first CPU to support the POPCNT instruction was Intel's Nehalem. It looks like yours is of the Core line, which is older. Hasturkun's suggestion will work on your system, but will be implemented with multiple instructions instead of a single one.

If you want a portable solution rather than a GCC-specific one, check out Sean Eron Anderson's excellent Bit Twiddling Hacks page, which has highly optimized code for this.

Cory Nelson
  • 29,236
  • 5
  • 72
  • 110
  • 3
    Indeed. Phrased differently, `popcnt` was added as part of SSE4 (and the OP's T2370 only supports supplemental SSE3). – Pascal Cuoq Nov 11 '12 at 15:34
  • 3
    `POPCNT` was introduced at the same time as SSE4.2, but is not a part of it. It has its own `CPUID` bit. – Cory Nelson Nov 13 '12 at 13:47
  • 2
    In spite of the danger of appearing old school, IBMs POWER5 already featured `POPCNT`. http://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.alangref/idalangref_popcntbd.htm?lang=it – jupp0r Oct 13 '14 at 14:45
  • 1
    @jupp0r : In spite of the danger of appearing ANCIENT school, the Control Data mainframes like the CDC 7300 that I used to program in the 1970s had a popcount instruction. It worked on 60-bit words and took several times as long as simple instructions like ADD. – Brendan McKay Jan 12 '16 at 05:11
  • 1
    @BrendanMcKay you got me there :) The Cray-1 could also do popcnt (1975) – jupp0r Jan 12 '16 at 15:03
2

Your CPU does not support POPCNT. (see https://en.wikipedia.org/wiki/SSE4) But you can use this free and open source tool, to detect if it is supported: https://github.com/mgorny/cpuid2cpuflags

It returns for Intel Core i7-3770 for example

CPU_FLAGS_X86: aes avx f16c mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3
Jonas Stein
  • 6,826
  • 7
  • 40
  • 72