Highest Voted 'swar' Questions

80

votes

8 answers

Subtracting packed 8-bit integers in an 64-bit integer by 1 in parallel, SWAR without hardware SIMD

If I have a 64-bit integer that I'm interpreting as an array of packed 8-bit integers with 8 elements. I need to subtract the constant 1 from each packed integer while handling overflow without the result of one element affecting the result of…

asked Jan 07 '20 at 23:56

cam-white

710
5
9

23

votes

4 answers

How does this algorithm to count the number of set bits in a 32-bit integer work?

int SWAR(unsigned int i) { i = i - ((i >> 1) & 0x55555555); i = (i & 0x33333333) + ((i >> 2) & 0x33333333); return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24; } I have seen this code that counts the number of bits equals to 1 in…

c++ c algorithm hammingweight swar

asked Feb 27 '14 at 22:16

Gaith

798
8
19

10

votes

1 answer

SIMD-within-a-register version of min/max

Suppose I have two uint16_t[4] arrays, a and b. Each integer in these arrays is in the range [0, 16383], so bits 14 and 15 aren't set. Then I have some code to find the minimum and maximum among a[i] and b[i] for each i: uint16_t min[4], max[4]; for…

c assembly bit-manipulation arm64 swar

asked Jan 18 '23 at 00:42

swineone

2,296
1
18
32

8

votes

1 answer

How does this color blending trick that works on color components in parallel work?

I saw this Java code that does a perfect 50% blend between two RGB888 colors extremely efficiently: public static int blendRGB(int a, int b) { return (a + b - ((a ^ b) & 0x00010101)) >> 1; } That's apparently equivalent to extracting and…

optimization language-agnostic bit-manipulation color-blending swar

asked Dec 19 '13 at 12:17

Boann

48,794
16
117
146

6

votes

4 answers

How to implement SWAR unsigned less-than?

I'm trying to use uint64_t as if it was 8 lanes of uint8_ts; my goal is to implement a lane-by-lane less-than. This operation, given x and y, should produce a result with 0xFF in a lane if the value for the corresponding lane in x is less than the…

c bit-manipulation swar

asked Aug 08 '21 at 02:55

Koz Ross

3,040
2
24
44

5

votes

2 answers

Compare 64-bit integers by segments

I have two 64-bit integers x and y. Each of them represents 5 short unsigned integers: the first 10 bits represent the first integer, the next 13 bits represent the 2nd integer, the next 16 bits represent the 3rd integer, the next 14 bits represent…

c++ bit-manipulation swar

asked Jun 03 '18 at 04:29

user2961927

1,290
1
14
22

4

votes

4 answers

Multiplication of two packed signed integers in one

The Stockfish chess engine needs to store, for its evaluation, both an endgame score and a middlegame score. Instead of storing them separately, it packs them into one int. The middlegame score is stored in the lower 16 bits. The endgame score is…

c++ bit-manipulation stockfish swar

asked Nov 22 '22 at 02:27

Chayim Friedman

47,971
5
48
77

3

votes

1 answer

Performantly reverse the order of 16-bit quantities within a 64-bit word

I need to do a lexicographic comparison of a small number of small unsigned integers. If there are (for example) 8 8-bit integers, the obvious approach is to byteswap them and do an ordinary integer compare in a GPR. If there are 2 32-bit…

performance assembly optimization x86-64 swar

asked May 15 '22 at 06:48

Moonchild

483
1
4
15

3

votes

2 answers

Can a register hold multiple values at a time?

In the case of a 64-bit x86 register, is it possible to hold more than one value at a time in the same register, if the size of an value is small enough such that multiple instructions could fit into a register? For example fitting two 32 bit ints…

assembly x86-64 simd cpu-registers swar

asked Apr 11 '20 at 23:46

Luke Davis

143
7

1

vote

2 answers

SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?

Bit Twiddling Hacks contains the following macros, which count the number of bytes in a word x that are less than, or greater than, n: #define countless(x,n) \ (((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255) #define…

c bit-manipulation swar

asked Jul 08 '21 at 03:37

Koz Ross

3,040
2
24
44

1

vote

2 answers

Is there an efficient way of doing 32-bit bitwise rotation separately on the high and low 32-bit parts of a 64-bit number?

I'm currently working in C/C++, and I have a uint64_t. I need to do a bitwise rotation on the top 32 bits and the bottom 32 bits separately. So for example, if my input is | | | …

c bit-manipulation bit-shift swar

asked Feb 17 '15 at 03:53

MNagy

423
7
20

0

votes

1 answer

Speed up strlen using SWAR in x86-64 assembly

The asm function strlen receives the link to a string as a char - Array. To to so, the function may use SWAR on general purpose register, but without using xmm register or SSE instructions. The function checks with the bit manipulation: (v -…

assembly x86-64 micro-optimization strlen swar

asked Jun 04 '23 at 16:53

HeapUnderStop

378
1
9

0

votes

1 answer

How to check if a register contains a zero byte without SIMD instructions

Given a 64 Bit general purpose register (Not a xmm register) in x64 architecture, filled with one byte unsigned values. How can I check it for a zero value simultaneously without using SSE instructions? Is there a way to do so in a parallel way,…

assembly bit-manipulation x86-64 simd swar

asked Jun 01 '23 at 12:56

HeapUnderStop

378
1
9

0

votes

4 answers

Add two vectors (uint64_t type) with saturation for each int8_t element

I was recently faced with a given problem: There are 8 elements in the vector, each is represented by int8_t. Implement an algorithm in x86_64 that will add two vectors (uint64_t type). Adding elements should be done with saturation arithmetic…

assembly bit-manipulation x86-64 saturation-arithmetic swar

asked Apr 21 '23 at 22:14

Umbra

123
1
5

0

votes

2 answers

Fastest way to find 16bit match in a 4 element short array?

I may confirm by using nanobench. Today I don't feel clever and can't think of an easy way I have a array, short arr[]={0x1234, 0x5432, 0x9090, 0xFEED};. I know I can use SIMD to compare all elements at once, using movemask+tzcnt to find the index…

c x86-64 micro-optimization swar

asked Dec 14 '22 at 19:11

user20746246

Questions tagged [swar]