7

My dataset: 500,000 points in 960 dimensions. The size of the file is 1.9 GB (1,922,000,000 bytes).

The code works for smaller data sets, but for this it will crash in the same point every time. Here is a minimal example:

#include <iostream>
#include <vector>

template<typename T>
class Division_Euclidean_space {
 public:
  /**
   * The data type.
   */
  typedef T FT;

  /**
   * Constructor, which
   * sets 'N' and 'D' to zero.
   */
  Division_Euclidean_space()
      : N(0),
        D(0) {

  }

  /**
   * @param n - size of data
   */
  void setSize(size_t& n) {
    N = n;
  }

  /**
   * @param n - size of data
   */
  void setSize(int n) {
    N = n;
  }

  /**
   * Get the number of points
   *
   * @return - the number of points
   */
  const size_t& size() const {
    return N;
  }

  /**
   * Get the dimension of points
   *
   * @return - the dimension
   */
  const int dim() {
    return D;
  }

  /**
   * @param d - dimension of data
   */
  void setDim(int& d) {
    D = d;
  }

  /**
   * \brief Inserts a new value to the collection of
   * points, held in the private vector.
   *
   * @param v - value to be inserted
   */
  void insert(FT v) {
    p.push_back(v);
  }

 private:
  /**
   * number of points
   */
  size_t N;
  /**
   * dimension of points
   */
  int D;
  /**
   * vector of points
   * Note that indexing is of the form: [i * D + j]
   */
  std::vector<FT> p;
};

typedef Division_Euclidean_space<int> Division_space;
typedef Division_space::FT FT;

template<typename T>
void readDivisionSpacefvecs(Division_Euclidean_space<T>& ds, int& N, int& D,
                            char* filename) {
  FILE* fid;
  fid = fopen(filename, "rb");
  if (!fid)
    printf("I/O error : Unable to open the file %s\n", filename);

  // we assign the return value of fread() to 'sz' just to suppress a warning
  size_t sz = fread(&D, sizeof(D), 1, fid);
  fseek(fid, 0L, SEEK_END);
  sz = ftell(fid);
  N = sz / (1 * 4 + D * 4);
  //printf("N = %d, D = %d, |%s|\n", N, D, filename);

  fseek(fid, 0L, SEEK_SET);
  ds.setSize(N);
  ds.setDim(D);
  std::cout << ds.dim() << " " << ds.size() << "\n";
  int c = 0;
  float v;
  int i, j;
  for (i = 0; i < N; ++i) {
    sz = fread(&D, sizeof(D), 1, fid);
    //printf("%d\n", D);
    for (j = 0; j < D; ++j) {
      sz = fread(&v, sizeof(v), 1, fid);
      if (c >= 279619)
        printf("j = %d, v = %f, read up to point %d\n", j, v, c);
      ds.insert(v);
    }
    ++c;
    printf("read up to %d\n", c);
  }
  if (c != N)
    printf("WARNING! Read less points than expected.\n");
}

int main() {
  Division_space test;
  int N, D;
  readDivisionSpacefvecs<FT>(test, N, D, "../../parallel/rkd_forest/Datasets/gist/gist_learn.fvecs");

  return 0;
}

Output:

...
j = 255, v = 0.052300, read up to point 279620
j = 256, v = 0.052300, read up to point 279620
terminate called after throwing an instance of 'std::bad_alloc'
  what(): std::bad_alloc
Aborted

Am I out of memory? How can I know?

Here is how much memory I have:

samaras@samaras-A15:~$ free -mt
             total       used       free     shared    buffers     cached
Mem:          3934       2638       1295          0        179       1000
-/+ buffers/cache:       1458       2475
Swap:         3987          0       3987
Total:        7922       2638       5283
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • 4
    Your 32-bit system has no free contiguous memory, requested. –  Mar 27 '15 at 20:35
  • Did you try debugging your code and finding where it throws `bad_alloc`? You did not expose the type of `Division_Euclidean_space`, and it appears to be where you are storing stuff: can you write a toy `Division_Euclidean_space` equivalent that generates the same problem and expose it? – Yakk - Adam Nevraumont Mar 27 '15 at 20:44
  • 3
    We'd need to see the implementation of `Division_Euclidean_space`. It may be a naive container that implements insertions with an allocate/copy/free, requiring twice as much memory as the thing occupies. Or it might require lots of contiguous memory even as it forces fragmentation. Or something similar. A 64-bit platform might solve the problem too. – David Schwartz Mar 27 '15 at 20:45
  • 1
    Is it possible that `ds` uses a vector and that its total capacity is not set from the start ? This would result in many reallocation, causing to a fragmented free... – Christophe Mar 27 '15 at 20:46
  • Funny. You ask why `Division_Euclidean_space` exceeds allocation capacity, but don't show it to us or tell us what it does. – Lightness Races in Orbit Mar 27 '15 at 20:47
  • That's why comments are useful @LightnessRacesinOrbit. I would be delighted to show it to you, but the post is closed. What should I do? – gsamaras Mar 28 '15 at 13:49
  • @DieterLücking can you please explain your comment? Also I edited and voted for a reopen! I did not post the class in order to keep the post simple, damn it and now I got a penalty! – gsamaras Mar 28 '15 at 13:55
  • You mean that I run out of memory, right? How did you know that it was 32-bit? – gsamaras Mar 28 '15 at 14:47
  • This is still not a complete, minimal testcase. Come on you really should know better. – Lightness Races in Orbit Mar 28 '15 at 16:37
  • I did not say I did such a thing @LightnessRacesinOrbit! I edited the post with a minimal example now. Maybe this can wipe out at least the -1 :/ – gsamaras Mar 28 '15 at 17:51
  • Nope, it's still not right. You rely on external data that is not provided. Come on. – Lightness Races in Orbit Mar 28 '15 at 18:52
  • That's why I did not post a minimal example in first place! The problem occurs only with THAT external data, that I would say would be foolish to provide them here for somebody to download them (too much work). @LightnessRacesinOrbit – gsamaras Mar 28 '15 at 18:57
  • @G.Samaras: Part of the debugging process is abstracting it away. We cannot reproduce it without that. – Lightness Races in Orbit Mar 28 '15 at 19:02
  • No, it's too much work for someone to download my data and then run the code. However, I can not delete the post, since it is has an answer @LightnessRacesinOrbit. I did not have this in my mind when posted. – gsamaras Mar 28 '15 at 19:04
  • @G.Samaras "too much work" Sigh. Too much work for you to do your debugging = too much work for me to solve the problem for you for free. – Lightness Races in Orbit Mar 28 '15 at 20:40
  • Too much work for the other people!!!!!! @LightnessRacesinOrbit sorry if my English are not clear, but please. :) – gsamaras Mar 29 '15 at 00:11

1 Answers1

10

std::bad_alloc means a problem with allocating a memory - so yes, you're most likely out of memory. Unfortunately, there is no a reliable way to "handle" this kind of exception - you can catch it and gratefully exit application.

szulak
  • 703
  • 7
  • 16
  • Yeah, that's what I thought, but I wanted a second opinion, unfortunately, my post is closed! – gsamaras Mar 28 '15 at 13:50
  • @G.Samaras yes, it's what I meant. Basically talking, it can be either a failure of allocating memory by 'new' or 'new[]'operator, or a exception thrown by user-defined 'new' or 'new[]'operator. – szulak Mar 28 '15 at 19:44