Memset an int (16 bit) array to short's max value

Question

Can't seem to find the answer to this anywhere, How do I memset an array to the maximum value of the array's type? I would have thought memset(ZBUFFER,0xFFFF,size) would work where ZBUFFER is a 16bit integer array. Instead I get -1s throughout.

Also, the idea is to have this work as fast as possible (it's a zbuffer that needs to initialize every frame) so if there is a better way (and still as fast or faster), let me know.

edit: as clarification, I do need a signed int array.

Do you know that `0xFFFF` is -1, if you interpret it as a 16-bit signed integer? — Jesper, Apr 11 '13 at 11:48
Do you need `short` or `unsigned short` array? The answer will differ then. If you can switch to `unsigned short`, then your solution will work. However, `memset` is bytewise and `0xFFFF` is redundant, `0xFF` is enough. — unkulunkulu, Apr 11 '13 at 11:50
@Jesper The nitpicker in me can't resist pointing out that it would be -0 (or a trap representation) on ones' complement machines and -32767 on a sign-and-magnitude machine ;) — Daniel Fischer, Apr 11 '13 at 11:52
Wow, initializing a zbuffer is a bottleneck when rendering your frames! I am impressed. — R. Martinho Fernandes, Apr 11 '13 at 11:53
Also see [memset not filling array](http://stackoverflow.com/questions/1418857/memset-not-filling-array?rq=1) — Bo Persson, Apr 11 '13 at 11:54
If you want to use 16 bit integers, use `uint16_t`/`int16_t` not `short`. C does not specify the size of `short`, just the minimum size and does not guarantee that `short` does not have padding bits. — 12431234123412341234123, Nov 15 '18 at 15:28

Didier Trosset · Accepted Answer · 2013-04-12T14:14:06.947

9

In C++, you would use std::fill, and std::numeric_limits.

#include <algorithm>
#include <iterator>
#include <limits>

template <typename IT>
void FillWithMax( IT first, IT last )
{
    typedef typename std::iterator_traits<IT>::value_type T;
    T const maxval = std::numeric_limits<T>::max();
    std::fill( first, last, maxval );
}

size_t const size=32;
short ZBUFFER[size];
FillWithMax( ZBUFFER, &ZBUFFER[0]+size );

This will work with any type.

In C, you'd better keep off memset that sets the value of bytes. To initialize an array of other types than char (ev. unsigned), you have to resort to a manual for loop.

edited Apr 12 '13 at 14:14

answered Apr 11 '13 at 11:58

Didier Trosset

36,376
13
83
122

From what a quick search tells me, the speed of std::fill is platform dependent and usually slower than memset. Is this true? Speed (for this particular operation) is more far important than portability, readability, stability, etc. Oh and I'm currently using VS2012 in c++ (though my code is closer to c). – DanielST Apr 11 '13 at 12:08
@slicedtoad is speed more important than actually working? Because memset doesn't work. – R. Martinho Fernandes Apr 11 '13 at 12:11
If it makes you feel better, that is bullshit. In VS2012 (and any other implementation that wants to be considered decent), `std::fill` will simply use `memset` if `memset` works. – R. Martinho Fernandes Apr 11 '13 at 12:17
So initializing an unsigned int array is always faster since you can use memset? (or fill uses it). – DanielST Apr 11 '13 at 12:18
2

@R.MartinhoFernandes Well, if it is really just an array of `unsigned shorts` and he wants to initialize them to all `1`s, then `memset` *will* work defined and platform-independently, just that it won't work *in general* (for initialization values that are not all `1`s, or other trivial things like `0`), which is why `std::fill` of course cannot use it. (Still I'm not saying one is likely to be faster than the other and I would still prefer `std::fill`, but just saying that in his case `memset` would perfectly work, even if it of course won't in general). – Christian Rau Apr 11 '13 at 13:14
@ChristianRau no, it will not work in his case! The maximum value of a signed short is not all 1s (just read the question: it mentions that memsetting to all 1s doesn't work). – R. Martinho Fernandes Apr 11 '13 at 13:16
2

@R.MartinhoFernandes Of course, that's why I spoke of `unsigned short`s. He may be talking just about *"16bit integers"*, but I have a strong feeling he's really interrested in `unsigned shorts`. But Ok, you're right in that this isn't completely clear from the question and when really unsing `short`s it won't work. Wait, sorry, I missed the latest edit explicitly stating signedness. Just forget my comment. – Christian Rau Apr 11 '13 at 13:18
@slicedtoad so you claim that the `std::fill` solution may be slower than the `memset` solution? What `memset` solution? How do you even know there even exists one? The code you posted does not fill the array with SHRT_MAX, so it's not one. – Adrian Panasiuk Apr 11 '13 at 15:55
@AdrianPanasiuk there is a memset solution for an unsigned short array though, right? What I meant in my comment 5 up from here is that initializing an unsigned short array is faster than a signed one since you can memset. And it was meant as a question. – DanielST Apr 11 '13 at 16:37
@slicedtoad oh okay, (although it's non-portable.) – Adrian Panasiuk Apr 11 '13 at 17:13
Sounds like premature optimization to me, if it turns out to be an issue then perhaps you don't need to init the z-buffer every frame, or then look at using something else. – paulm Apr 11 '13 at 18:20

score 7 · Answer 2 · edited Apr 11 '13 at 18:14

-1 and 0xFFFF are the same thing in a 16 bit integer using a two's complement representation. You are only getting -1 because either you have declared your array as short instead of unsigned short. Or because you are converting the values to signed when you output them.

BTW your assumption that you can set something except bytes using memset is wrong. memset(ZBUFFER, 0xFF, size) would have done the same thing.

score 4 · Answer 3 · answered Apr 11 '13 at 11:59

4

In C++ you can fill an array with some value with the std::fill algorithm.

std::fill(ZBUFFER, ZBUFFER+size, std::numeric_limits<short>::max());

This is neither faster nor slower than your current approach. It does have the benefit of working, though.

answered Apr 11 '13 at 11:59

R. Martinho Fernandes

228,013
71
433
510

same as the question to Didier Trosset – DanielST Apr 11 '13 at 12:09
@slicedtoad The answer to that is in my answer already. I don't care if you want your program to be so fast that it only produces garbage output. – R. Martinho Fernandes Apr 11 '13 at 12:11

autistic · Answer 4 · 2013-04-11T16:19:31.310

Don't attribute speed to language. That's for implementations of C. There are C compilers that produce fast, optimal machine code and C compilers that produce slow, inoptimal machine code. Likewise for C++. A "fast, optimal" implementation might be able to optimise code that seems slow. Hence, it doesn't make sense to call one solution faster than another. I'll talk about the correctness, and then I'll talk about performance, however insignificant it is. It'd be a better idea to profile your code, to be sure that this is in fact the bottleneck, but let's continue.

Let us consider the most sensible option, first: A loop that copies int values. It is clear just by reading the code that the loop will correctly assign SHRT_MAX to each int item. You can see a testcase of this loop below, which will attempt to use the largest possible array allocatable by malloc at the time.

#include <limits.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
    size_t size = SIZE_MAX;
    volatile int *array = malloc(size);

    /* Allocate largest array */
    while (array == NULL && size > 0) {
        size >>= 1;
        array = malloc(size);
    }

    printf("Copying into %zu bytes\n", size);

    for (size_t n = 0; n < size / sizeof *array; n++) {
        array[n] = SHRT_MAX;
    }

    puts("Done!");
    return 0;
}

I ran this on my system, compiled with various optimisations enabled (-O3 -march=core2 -funroll-loops). Here's the output:

Copying into 1073741823 bytes
Done!

Process returned 0 (0x0)   execution time : 1.094 s
Press any key to continue.

Note the "execution time"... That's pretty fast! If anything, the bottleneck here is the cache locality of such a large array, which is why a good programmer will try to design systems that don't use so much memory... Well, then let us consider the memset option. Here's a quote from the memset manual:

The memset() function copies c (converted to an unsigned char) into each of the first n bytes of the object pointed to by s.

Hence, it'll convert 0xFFFF to an unsigned char (and potentially truncate that value), then assign the converted value to the first size bytes. This results in incorrect behaviour. I don't like relying upon the value SHRT_MAX to be represented as a sequence of bytes storing the value (unsigned char) 0xFFFF, because that's relying upon coincidence. In other words, the main problem here is that memset isn't suitable for your task. Don't use it. Having said that, here's a test, derived from the test above, which will be used to test the speed of memset:

#include <limits.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
    size_t size = SIZE_MAX;
    volatile int *array = malloc(size);

    /* Allocate largest array */
    while (array == NULL && size > 0) {
        size >>= 1;
        array = malloc(size);
    }

    printf("Copying into %zu bytes\n", size);

    memset(array, 0xFFFF, size);

    puts("Done!");
    return 0;
}

A trivial byte-copying memset loop will iterate sizeof (int) times more than the loop in my first example. Considering that my implementation uses a fairly optimal memset, here's the output:

Copying into 1073741823 bytes
Done!

Process returned 0 (0x0)   execution time : 1.060 s
Press any key to continue.

These tests are likely to vary, however significantly. I only ran them once each to get a rough idea. Hopefully you've come to the same conclusion that I have: Common compilers are pretty good at optimising simple loops, and it's not worth postulating about micro-optimisations here.

In summary:

Don't use memset to fill ints with values (with an exception for the value 0), because it's not suitable.
Don't postulate about optimisations prior to running tests. Don't run tests until you have a working solution. By working solution I mean "A program that solves an actual problem". Once you have that, use your profiler to identify more significant opportunities to optimise!

First, of your 0xFFFF, only 0xFF will be used. Second, 0xFF is -1, not the maximum value of a signed int. — Prof. Falken, Apr 11 '13 at 15:52
@AmigableClarkKant "*First, of your 0xFFFF, only 0xFF will be used.*" Did you read my entire answer? Apparently not. Please note where I stated this numerous times: *Hence, it'll convert 0xFFFF to an unsigned char (and potentially truncate that value), then assign the converted value to the first size bytes. This results in incorrect behaviour. I don't like relying upon the value SHRT_MAX to be represented as a sequence of bytes storing the value (unsigned char) 0xFFFF, because that's relying upon coincidence. In other words, the main problem here is that memset isn't suitable for your task.* — autistic, Apr 11 '13 at 16:10
@AmigableClarkKant Consider the validity of your first statement when `CHAR_BIT == 16` and `UCHAR_MAX == 0xFFFF`. Second, `0xFF` is certainly *not* -1. It's `0xFF`. Please identify the statement in which I implied that `0xFF` is the maximum value of a signed int. — autistic, Apr 11 '13 at 16:11
@AmigableClarkKant I was aiming to test a *correct* loop versus the *incorrect* memset, in order to demonstrate that optimising the correct loop is silly. You identified that memset is incorrect for this. Hurrah, for you! You didn't, however, identify the point of this answer, despite it sticking out like a sore thumb in **bold**. — autistic, Apr 11 '13 at 16:16
I was in a hurry sorry. Edit something so I can give you an upvote instead of a downvote, my vote is locked until you edit your post. — Prof. Falken, Apr 11 '13 at 16:17
Very well. I'll accept that my answer could have been improved by bolding the first point in my summary, too. Done... — autistic, Apr 11 '13 at 16:20

score 2 · Answer 5 · answered Apr 11 '13 at 11:49

2

This is because of two's complement. You have to change your array type to unsigned short, to get the max value, or use 0x7FFF.

answered Apr 11 '13 at 11:49

Carsten

11,287
7
39
62

score 2 · Answer 6 · answered Apr 11 '13 at 11:58

2

for (int i = 0; i < SIZE / sizeof(short); ++i) {
    ZBUFFER[i] = SHRT_MAX;
}

Note this does not initialize the last couple bytes, if (SIZE % sizeof(short))

answered Apr 11 '13 at 11:58

Adrian Panasiuk

7,249
5
33
54

2

He's said that `ZBUFFER` is an array of `short`. (If you want to play it safe, `i < sizeof(ZBUFFER) / sizeof(ZBUFFER[0])`.) – James Kanze Apr 11 '13 at 12:02
So what exactly is happening here? I don't understand why this won't just set the first `(size/sizeof(short))` positions in `ZBUFFER` to `SHRT_MAX`. Or is `SHRT_MAX` not just the max short amount? – DanielST Apr 11 '13 at 16:08
I decided to use `SIZE` the way you did in the question, where it is passed to `memset` as the number of bytes to be filled. Thus, I assumed that `SIZE` is the amount of bytes encompassed by `ZBUFFER` (as opposed to number of shorts stored.) – Adrian Panasiuk Apr 11 '13 at 16:24
1

I didn't realizes what SHRT_MAX was. Makes sense now, thanks. – DanielST Apr 11 '13 at 16:28

score 2 · Answer 7 · edited May 23 '17 at 12:15

In C, you can do it like Adrian Panasiuk said, and you can also unroll the copy loop. Unrolling means copying larger chunks at a time. The extreme end of loop unrolling is copying the whole frame over with a zero frame, like this:

init()
{
    for (int i = 0; i < sizeof(ZBUFFER) / sizeof(ZBUFFER[0]; ++i) {
        empty_ZBUFFER[i] = SHRT_MAX;
    }
}

actual clearing:

memcpy(ZBUFFER, empty_ZBUFFER, SIZE);

(You can experiment with different sizes of the empty ZBUFFER, from four bytes and up, and then have a loop around the memcpy.)

As always, test your findings, if a) it's worth optimizing this part of the program and b) what difference the different initializing techniques makes. It will depend on a lot of factors. For the last few per cents of performance, you may have to resort to assembler code.

oh, that's cool, I did not know about that. Still not sure how Adrian Panasiuk's works though. — DanielST, Apr 11 '13 at 16:05
@slicedtoad, `#define SHRT_MAX 0x7FFF` <- Which is the largest integer value which fits in a 16 bit int. From `#include ` — Prof. Falken, Apr 11 '13 at 16:25

score 0 · Answer 8 · answered Apr 11 '13 at 12:01

0

#include <algorithm>
#include <limits>

std::fill_n(ZBUFFER, size, std::numeric_limits<FOO>::max())

where FOO is the type of ZBUFFER's elements.

answered Apr 11 '13 at 12:01

fizzer

13,551
9
39
61

score 0 · Answer 9 · answered Apr 11 '13 at 12:01

When you say "memset" do you actually have to use that function? That is only a byte-by-byte assign so it won't work with signed arrays.

If you want to set each value to the maximum you would use something like:

std::fill( ZBUFFER, ZBUFFER+len, std::numeric_limits<short>::max() )

when len is the number of elements (not the size in bytes of your array)

Memset an int (16 bit) array to short's max value

9 Answers9