0

I have the following class that tries to work on huge arrays where initialising, reading, writing and deleting are all O(1):

#include <cstdlib>
#include <iostream>

using namespace std;

template<class T> class Init_array {
  public:
    explicit Init_array(size_t n, const T &val = T()) :
      data(static_cast<T *>(malloc(sizeof(T) * n))), default_value(val),
      valid(static_cast<size_t *>(malloc(sizeof(size_t) * n))),
      x_check(static_cast<size_t *>(malloc(sizeof(size_t) * n))),
      count() {
    }
    ~Init_array() {
        free(data);
        free(valid);
        free(x_check);
    }
    bool is_valid(size_t index) const {
        // assume daemons do not fly out of the nose accessing uninitialised size_t
        return valid[index] < count && x_check[valid[index]] == index;
    }
    T &operator[](size_t index) {
        if (!is_valid(index)) {
            valid[index] = count;
            x_check[count] = index;
            ++count;
            new(data + index) T(default_value);
        }
        return data[index];
    }
    const T &operator[](size_t index) const {
        if (!is_valid[index]) {
            return default_value;
        } else {
            return data[index];
        }
    }
    void erase(size_t index) {
        x_check[valid[index]] = x_check[count - 1];
        valid[x_check[count - 1]] = valid[index];
        --count;
    }
  private:
    T *data;
    const T default_value;
    size_t *valid;
    size_t *x_check;
    size_t count;
};

int main() {
    Init_array<int> x(1ll << 30); // 20 GiB RAM needed
    cout << x[3827749] << '\n';
    const size_t test_index = 23829073;
    x[test_index] = 123456;
    const size_t another_test_index = 212748399;
    x[another_test_index] = 888888;
    cout << x[test_index] << '\n';
    cout << x[another_test_index] << '\n';
    x.erase(test_index);
    cout << x[test_index] << endl;
}

However, the line after the comment is technically undefined behaviour because it will try to read uninitialised values. I am afraid that it will crash on some machines. Is it possible to make the class free of undefined behaviour, while retaining everything as O(1)?

poy
  • 10,063
  • 9
  • 49
  • 74
Michael Tsang
  • 678
  • 5
  • 16
  • You might find my question and its answer helpful: http://stackoverflow.com/a/11965368 – user541686 Nov 19 '13 at 17:24
  • Just initialize the `valid` array...? – Kerrek SB Nov 19 '13 at 17:24
  • 3
    O(1) is an illusion. Dynamically allocating a block of N items is O(N) – n. m. could be an AI Nov 19 '13 at 17:25
  • @n.m.: Where did you get that idea from? – user541686 Nov 19 '13 at 17:25
  • n.m's facts may be slightly wrong, but he has identified the problem. The cost of a memset is minor compared to the cost of a malloc. You're optimizing the wrong thing. [and by the way "technically undefined" is just another way of saying "this is a bug waiting to happen." – Dale Wilson Nov 19 '13 at 17:28
  • Initializing an array of memory is O(n). O(1) is impossible here, though `memset` is usually very fast. There are also os-dependent solutions like Windows' `ZeroMemory` which might be even faster. – Marius Nov 19 '13 at 17:28
  • @Mehrdad That answer on UB is for C; for C++, the lvalue-to-rvalue conversion for an uninitialized object is *always* UB, no matter if the address is taken or not [conv.lval]/1. – dyp Nov 19 '13 at 17:29
  • @DyP: So you're saying it's wrong? – user541686 Nov 19 '13 at 17:31
  • @Mehrdad Hm? What's wrong? The answer your linked is correct for C, but not for C++ (AFAIK). – dyp Nov 19 '13 at 17:34
  • @DyP: Oh never mind, I misread what you wrote. That's... really surprising if it's true, to say the least... – user541686 Nov 19 '13 at 17:36
  • Maybe I'm having another one of those dumb moments, but what does it matter whether it's _undefined_ behavior when it is _meaningless_ behavior? Reading the uninitialized value from `valid` will almost certainly not cause a crash or cause demons to appear, or any such thing -- but the value is meaningless. What knowledge do you gain from comparing an uninitialized, unknown value to a known one? – Damon Nov 19 '13 at 18:14
  • @Damon I think the OP assumes that the cross-checking is highly unlikely to yield `true` if the value for that index hasn't been "initialized" via `operator[]`. – dyp Nov 19 '13 at 18:26
  • @Mehrdad It's not an idea, it's the the cold harsh reality, as opposed to the illusory world of theoretical CS. Your OS maps pages into your process one by one, and zeros them out before mapping, so that you process doesn't steal private keys and credit card numbers from your fellow user whose process has just died. Sure there are machines where this doesn't happen, but we're talking *worst case*, aren't we? – n. m. could be an AI Nov 19 '13 at 22:29
  • @n.m.: "Allocation" doesn't involve touching the pages at all though. It just involves bookkeeping. – user541686 Nov 19 '13 at 22:50
  • @n.m. In fact that's why malloc can succeed when there is no more memory left... – user541686 Nov 20 '13 at 01:31
  • @Mehrdad you are right, i forgot aboit this... – n. m. could be an AI Nov 20 '13 at 04:09
  • malloc is nearly always O(1) as it only marks a region of addresses available. This program runs instantly on my machine even I have only 8 GiB of physical memory. – Michael Tsang Nov 20 '13 at 14:14

0 Answers0