How to zero out array in O(1)?

Question

Is there an way to zero out an array in with time complexsity O(1)? It's obvious that this can be done by for-loop, memset. But their time complexity are not O(1).

What gives you the idea it's possible to do in O(1) time? I think there are some corner cases where platform-specific hacks allow it, but I'd like to hear your reasoning. — , May 29 '12 at 10:31
Just to make sure: Do you understand the difference between O(1) and 'a single code line'? — Asaf, May 29 '12 at 10:33
@ChristianRau - when I read the question memset was not in it... — Asaf, May 29 '12 at 10:37
An electro-magnet might truly clear all memory in constant time. — edA-qa mort-ora-y, May 29 '12 at 11:46
@delnan I would like to zero out a sparse matrix in O(1) time. — Peiyun, Jun 02 '12 at 08:57

Matthieu M. · Accepted Answer · 2020-12-08T09:41:23.530

21

Yes

However not any array. It takes an array that has been crafted for this to work.

template <typename T, size_t N>
class Array {
public:
    Array(): generation(0) {}

    void clear() {
        // FIXME: deal with overflow
        ++generation;
    }

    T get(std::size_t i) const {
        if (i >= N) { throw std::runtime_error("out of range"); }

        TimedT const& t = data[i];
        return t.second == generation ? t.first : T{};
    }

    void set(std::size_t i, T t) {
        if (i >= N) { throw std::runtime_error("out of range"); }

        data[i] = std::make_pair(t, generation);
    }


private:
    typedef std::pair<T, unsigned> TimedT;

    TimedT data[N];
    unsigned generation;
};

The principle is simple:

we define an epoch using the generation attribute
when an item is set, the epoch in which it has been set is recorded
only items of the current epoch can be seen
clearing is thus equivalent to incrementing the epoch counter

The method has two issues:

storage increase: for each item we store an epoch
generation counter overflow: there is something as a maximum number of epochs

The latter can be thwarted using a real big integer (uint64_t at the cost of more storage).

The former is a natural consequence, one possible solution is to use buckets to downplay the issue by having for example up to 64 items associated to a single counter and a bitmask identifying which are valid within this counter.

EDIT: just wanted to get back on the buckets idea.

The original solution has an overhead of 8 bytes (64 bits) per element (if already 8-bytes aligned). Depending on the elements stored it might or might not be a big deal.

If it is a big deal, the idea is to use buckets; of course like all trade-off it slows down access even more.

template <typename T>
class BucketArray {
public:
     BucketArray(): generation(0), mask(0) {}
     
     T get(std::size_t index, std::size_t gen) const {
         assert(index < 64);

         return gen == generation and (mask & (1 << index)) ?
                data[index] : T{};
     }

     void set(std::size_t index, T t, std::size_t gen) {
         assert(index < 64);

         if (generation < gen) { mask = 0; generation = gen; }

         mask |= (1 << index);
         data[index] = t;
     }

private:
     std::uint64_t generation;
     std::uint64_t mask;
     T data[64];
};

Note that this small array of a fixed number of elements (we could actually template this and statically check it's inferior or equal to 64) only has 16 bytes of overhead. This means we have an overhead of 2 bits per element.

template <typename T, size_t N>
class Array {
    typedef BucketArray<T> Bucket;
public:
    Array(): generation(0) {}
    
    void clear() { ++generation; }

    T get(std::size_t i) const {
        if (i >= N) { throw ... }

        Bucket const& bucket = data[i / 64];
        return bucket.get(i % 64, generation);
    }

    void set(std::size_t i, T t) {
        if (i >= N) { throw ... }

        Bucket& bucket = data[i / 64];
        bucket.set(i % 64, t, generation);
    }

private:
    std::uint64_t generation;
    Bucket data[N / 64 + 1];
};

We got the space overhead down by a factor of... 32. Now the array can even be used to store char for example, whereas before it would have been prohibitive. The cost is that access got slower, as we get a division and modulo (when we will get a standardized operation that returns both results in one shot ?).

edited Dec 08 '20 at 09:41

answered May 29 '12 at 11:02

Matthieu M.

287,565
48
449
722

Very clever! You reduce the cost of clearing the array to O(1) by transferring it to accessing the array instead. If the array experiences fewer than N accesses for every time it is cleared, your solution is faster than paying N to clear it. I am impressed. – Sparky May 29 '12 at 11:13
Oh yes, that's better than mine. – Steve Jessop May 29 '12 at 11:15
Aside from just using a big enough type, I think you can technically solve the wraparound issue using the fact that there cannot be more elements in the array than possible generations (`SIZE_MAX`). So each time you increment the generation, set the generation of just one element to `new_generation-1`, going round in a circle. For each element, it's guaranteed that by the time the generation wraps back round, its generation has been moved forward to a more recent one. Or as you say, no computer can in fact perform `2^64` operations anyway. – Steve Jessop May 29 '12 at 11:20
@Sparky: and even if the array is accessed more often, so that the overall cost is higher, this technique could mean `clear` satisfies some realtime deadline that it could not satisfy if it had to clear the actual memory. You'd sacrifice overall performance for a better worst case of any single operation, which is basically the life of the realtime programmer :-) – Steve Jessop May 29 '12 at 11:31
@SteveJessop: I don't understand the trick with `SIZE_MAX` to prevent the wrap-around. Suppose that between two calls to `clear` I always set 3 elements, how do you handle it ? I think we could do it backward. That is, increasing the generation of older elements (without going over the current limit); if we only use half the range available, then we can circle around it without issue. – Matthieu M. May 29 '12 at 12:40
Suppose the array is size 10, and `SIZE_MAX` is 15. The first time we clear, set the current generation to 1, and set the generation of element 0 to 0. The second time we clear, set the current generation to 2, and set the generation of element 1 to 1, and so on. So at all times, the generation of any element is *at most* the current generation minus 10 (because that's the longest ago it can possibly have been touched), but might be higher (if it has been accessed with `set` more recently). – Steve Jessop May 29 '12 at 14:31
This means that when the current generation finally wraps from 15 to 0, every element in the array has generation at least 5, and hence there's no risk of falsely thinking it's set when it isn't. The extra space occupied by the generation counts ensures that the number of elements in the array must always be comfortably less than `SIZE_MAX`. In fact less than half of `SIZE_MAX` in your non-bucket code, but even with your optimization of sharing a count between multiple elements, there's plenty of space to account for any out-by-one error in my logic. – Steve Jessop May 29 '12 at 14:32
"when we will get a standardized operation that returns both results in one shot" - I don't think we need one, that's the optimizer's problem. If there's an efficient opcode to generate both at once (as often there is), then a compiler that doesn't emit it when you write `int q = a / b, r = a % b;` just needs fixing, is all. Ideally it should use it in all cases where you compute both without possibility of changing the values of `a` and `b` in between, but I don't know how close real compilers are to the ideal. – Steve Jessop May 29 '12 at 14:38
Ooh, and looking at my own explanation, I think the tweak to your code to achieve this is ridiculously simple. Replace `void clear() { ++generation; }` with `void clear() { data[generation % N].second = generation; ++generation; }` and we're done. No wraparound problems, and only one line to disable as a speed boost in cases where we know that wrapping around will take too long to be worth coding against. – Steve Jessop May 29 '12 at 14:50
@SteveJessop: I finally get how this work. I had assumed that `generation` would always been superior to the `.second`, but you are right that strict equality is sufficient :) Nice touch! – Matthieu M. May 29 '12 at 15:52
@MatthieuM. It doesn't work. The uninitializable array at some index might have a value of (memory garbage, haven't written to yet). When this uninitialized cell is read, it will return the bad_value, instead of the T{}. You still need to init the generations to 0, or risk having these problems sometimes (sorry for the 8y difference :/). – Tom Dec 07 '20 at 21:25
@Tom: Only clearing is required to be O(1), not initializing... `std::pair` [value initializes](https://en.cppreference.com/w/cpp/language/value_initialization) its elements, so there should be no garbage. – Matthieu M. Dec 08 '20 at 09:39
@MatthieuM. Ok got you, I thought initialization in O(1) is required too. There is a [Great Solution](https://medium.com/p/2ab4c0335e8b) to the initialization - you can initialize, fill, read, write to an array in O(1) worst time, and with only 1 bit of extra memory. Full disclosure, I wrote that article, but it seems like a good place to share it. – Tom Dec 08 '20 at 10:59

Steve Jessop · Answer 2 · 2012-05-29T14:45:57.693

You can't modify n locations in memory in less than O(n) (even if your hardware, for sufficiently small n, maybe allows a constant-time operation to zero certain nicely-aligned blocks of memory, like for example flash memory does).

However, if the object of the exercise is a bit of lateral thinking, then you can write a class representing a "sparse" array. The general idea of a sparse array is that you keep a collection (perhaps a map, although depending on usage that might not be all there is to it), and when you look up an index, if it's not in the underlying collection then you return 0.

If you can clear the underlying collection in O(1), then you can zero out your sparse array in O(1). Clearing a std::map isn't usually constant-time in the size of the map, because all those nodes need to be freed. But you could design a collection that can be cleared in O(1) by moving the whole tree over from "the contents of my map", to "a tree of nodes that I have reserved for future use". The disadvantage would just be that this "reserved" space is still allocated, a bit like what happens when a vector gets smaller.

Nice idea, putting the root aside is indeed O(1). For a sparse array it certainly looks like the best bet. It also allows easy iteration which my solution does not propose (each change having to be tracked in mine...) — Matthieu M., May 29 '12 at 12:43

score 9 · Answer 3 · answered May 29 '12 at 10:40

9

It's certainly possible to zero out an array in O(1) as long as you accept a very large constant factor:

void zero_out_array_in_constant_time(void* a, size_t n)
{
    char* p = (char*) a;
    for (size_t i = 0; i < std::numeric_limits<size_t>::max(); ++i)
    {
        p[i % n] = 0;
    }
}

This will always take the same number of steps, regardless of the size of the array, hence it's O(1).

answered May 29 '12 at 10:40

fredoverflow

256,549
94
388
662

2

Theoretically correct, but poor performance in general. good answer – phoxis May 29 '12 at 10:42
6

@FredOverflow: Given that any actual computer is finite, you can extend your argument to argue that any plausable computation can be performed in O(1). This perhaps misses the point, however... – Managu May 29 '12 at 10:44
4

I hope it's obvious that my answer wasn't meant to be dead serious :) – fredoverflow May 29 '12 at 10:46

High Performance Mark · Answer 4 · 2012-05-29T10:51:12.473

4

No.

You can't visit every member of an N-element collection in anything less than O(N) time.

You might, as Mike Kwan has observed, shift the cost from run- to compile-time, but that doesn't alter the computational complexity of the operation.

edited May 29 '12 at 10:51

answered May 29 '12 at 10:33

High Performance Mark

77,191
7
105
161

score 2 · Answer 5 · answered May 29 '12 at 11:02

It's clearly not possible to initialize an arbitrarily sized array in a fixed length of time. However, it is entirely possible to create an array-like ADT which amortizes the cost of initializing the array across its use. The usual construction for this takes upwards of 3x the storage, however. To whit:

template <typename T, size_t arr_size>
class NoInitArray
{
    std::vector<T> storage;

    // Note that 'lookup' and 'check' are not initialized, and may contain
    // arbitrary garbage.
    size_t lookup[arr_size];
    size_t check[arr_size];
public:
    T& operator[](size_t pos)
    {
        // find out where we actually stored the entry for (*this)[pos].
        // This could be garbage.
        size_t storage_loc=lookup[pos];

        // Check to see that the storage_loc we found is valid
        if (storage_loc < storage.size() && check[storage_loc] == pos)
        {
            // everything checks, return the reference.
            return storage[storage_loc];
        }
        else
        {
            // storage hasn't yet been allocated/initialized for (*this)[pos].
            // allocate storage:
            storage_loc=storage.size();
            storage.push_back(T());

            // put entries in lookup and check so we can find 
            // the proper spot later:
            lookup[pos]=storage_loc;
            check[storage_loc]=pos;

            // everything's set up, return appropriate reference:
            return storage.back();
        }
    }
};

One could add a clear() member to empty the contents of such an array fairly easily if T is some type that doesn't require destruction, at least in concept.

score 2 · Answer 6 · answered Jul 18 '17 at 11:25

I like Eli Bendersky's webpage http://eli.thegreenplace.net/2008/08/23/initializing-an-array-in-constant-time, with a solution that he attributes to the famous book Design and Analysis of Computer Algorithms by Aho, Hopcroft and Ullman. This is genuinely O(1) time complexity for initialization, rather than O(N). The space demands are O(N) additional storage, but allocating this space is also O(1), since the space is full of garbage. I enjoyed this for theoretical reasons, but I think it might also be of practical value for implementing some algorithms, if you need to repeatedly initialize a very large array, and each time you access only a relatively small number of positions in the array. Bendersky provides a C++ implementation of the algorithm.

A very pure theorist might start worrying about the fact that N needs O(log(N)) digits, but I have ignored that detail, which would presumably require looking carefully at the mathematical model of the computer. Volume 1 of The Art of Computer Programming probably gives Knuth's view of this issue.

The space requirement can actually be improved to [1 extra bit of memory](https://medium.com/p/2ab4c0335e8b), which is superb theoretical bound. — Tom, Dec 08 '20 at 13:45

score 1 · Answer 7 · edited May 23 '17 at 12:09

1

It is impossible at runtime to zero out an array in O(1). This is intuitive given that there is no language mechanism which allows the value setting of arbitrary size blocks of memory in fixed time. The closest you can do is:

int blah[100] = {0};

That will allow the initialisation to happen at compiletime. At runtime, memset is generally the fastest but would be O(N). However, there are problems associated with using memset on particular array types.

edited May 23 '17 at 12:09

Community

1
1

answered May 29 '12 at 10:28

Mike Kwan

24,123
12
63
96

1

in the case the integer array is automatic, how will this initialization be in O(1) ? – phoxis May 29 '12 at 10:33
4

Even if it's not an automatic, the data section it lives in still has to be somehow copied into memory at runtime. The only way you could arguably initialize an array "at compile time" without there still being an `O(n)` cost at runtime is if (a) it's const, and (b) the dynamic linker loads const objects by memory-mapping the relevant part of the executable file in an O(1) op. – Steve Jessop May 29 '12 at 10:37
memory-mapping still involves copying from the disk into actual memory at some point -- though not necessarily at the moment you map the file. – edA-qa mort-ora-y May 29 '12 at 11:12
@edA-qa mort-ora-y: indeed. Depending on usage pattern, you might defer/amortize the cost of loading the disk data across the whole process of accessing all its elements, which is `O(n)` already. Or you might avoid the cost entirely, if enough elements are never accessed. Either way, the initialization is `O(1)`. If that's sufficiently important then paying for it later is acceptable, by definition of "sufficient" ;-) – Steve Jessop May 29 '12 at 11:34
Oh, I forgot to consider another case - if at compile time you're constructing an image for some embedded device, then you can have the same effect as memory-mapping on a "real" OS, without actually mapping anything. In effect the cost to initialize the array is paid at the time the image is written to the device. – Steve Jessop May 29 '12 at 12:27
"Impossible" has turned out to be false, see the accepted answer. – krlmlr Jul 09 '12 at 19:37

Sparky · Answer 8 · 2012-06-01T11:47:55.057

While still O(N), implementations that map to hardware-assisted operations like clearing whole cache lines or memory pages can run at <1 cycle per word.

Actually, riffing on Steve Jessop's idea...

You can do it if you have hardware support to clear an arbitrarily large amount of memory all at the same time. If you posit an arbitrarily large array, then you can also posit an arbitrarily large memory with hardware parallelism so that a single reset pin will simultaneously clear every register at once. That line will have to be driven by an arbitrarily large logic gate (that dissipates arbitrarily large power), and the circuit traces will have to be arbitrarily short (to overcome R/C delay) (or superconducting), but these things are quite common in extradimensional spaces.

the idea is totally viable&good (no idea why no one votes for). You can zero memory at constant time it just needs special hardware and properly aligned blocks and the effects should not be adverse. i.e. when allocating memory it's guaranteed to be unused (no free necessary) so it can be zero'd. Sparc used to (probably still does) support optimized mem zeroing to make java faster. — bestsss, Jun 01 '12 at 11:32

Tom · Answer 9 · 2020-12-11T00:19:07.513

It is possible to do it with O(1) time, and even with O(1) extra space.

I explained the O(1) time solution David mentioned in another answer, but it uses 2n extra memory.

A better algorithm exists, which requires only 1 bit of extra memory.
See the Article I just wrote about that subject.
It also explains the algorithm David mentioned, some more, and the state-of-the-art algorithm today. It also features an implementation of the latter.

In a short explanation (as I'll just be repeating the article) it cleverly takes the algorithm presented in David's answer and does it all in-place, while using only a (very) little extra memory.

How to zero out array in O(1)?

9 Answers9

It is possible to do it with O(1) time, and even with O(1) extra space.

Linked