0

I'm using a

vector<vector<size_t>> Ar;

structure. The contents of the structure change over time, and, in particular, the length of each of the nested vectors is random and changes in time. Order is important, and I cannot ignore the nested vector if it is empty. I know the maximum capacity of the nested vectors (say m) and outer vectors (say n).

I'm having some difficulty getting the initialization right. If I use

Ar(n);

there is no problem but I end up getting a memory fragmentation because the allocator does not know the size of nested vector. I would like to avoid this if possible, because I don't know what impact it will have as the size of the data I'm trying to handle increases. I try to get around the fragmentation by fixing the length of the nested vectors in advance to get a compact representation, but I'm having trouble doing this. I use

Ar(n,vector<size_t>(m));

but this is super slow and a massive waste of memory, because most of the entries will not be used.

I have successfully implemented this with a

vector<list<size_t> > Ar(n);

without suffering fragmentation, but it runs much slower than using a nested vector. A fixed representation such as a Boost::multi_array would take up too much space, for the same reason as the second initialization above, and it will be more difficult to implement because I would need to keep track of where the useful entries stop.

Any suggestions? Thanks in advance.

Plamen
  • 650
  • 1
  • 8
  • 27
  • 1
    Why is the "memory fragmentation" of your original design a problem? Have you measured it and determined that it's really a bottleneck? – Kerrek SB Jul 07 '13 at 11:58
  • I have measured the virtual memory usage. Is it wrong to do this? Is there another tool I ought to use? – Plamen Jul 07 '13 at 12:16
  • I just don't see anything particularly wrong with your initial approach. If you have a collection of differently-sized collections, then a vector of vectors seems sensible. If you're shrinking containers and are worried about the memory, you can use the swap trick, but otherwise you already get good memory locality within each inner container... – Kerrek SB Jul 07 '13 at 14:08

2 Answers2

1

You don't know if memory fragmentation is a problem until you've profiled your code with a typical use case. Unless m is very small in front of n, I think it won't be a bottleneck at all, since you still have mostly sequential memory accesses.

If you want to avoid it anyway, you could use reserve instead of resize or initialization with m objects. It will only allocate memory, without the overhead of constructing objects which will not be used, increasing initialization speed.

Moreover, reserveing capacity for the vectors will likely only consume virtual memory, not "real" memory, until you effectively use it.

And if you know the distribution of the inner vectors' size, use the mean value as the default length, it may help you reducing the waste of memory.

In any case, std::list is a bigger waste in space and a lot worse considering fragmentation.

rems4e
  • 3,112
  • 1
  • 17
  • 24
  • Thanks for your comprehensive answer. Typically, m is at least 5 or ten times larger than n. The reason is that it is the m is the maximum capacity is because I am using the nested vector as a means to sort m objects across n containers, literally a pigeon-hole principle. This means that some of the nested vectors may be empty. I need to clear and refill Ar over 60 times per run of the algorithm. I had some issues with nested lists in the past, so maybe am paranoid because of this experience. Will try with what I am doing, and implement resize if there is trouble. – Plamen Jul 08 '13 at 12:20
  • Strangely, I don't have increment in real memory using the nested list... Anyway, the implementation with nested vectors is much faster, so I will use one of the techniques you suggest above if ever the memory issue becomes very constricting. – Plamen Jul 08 '13 at 12:30
  • 1
    What is your reference when you say you don't have memory increment ? Also, I'm pretty sure fragmentation won't be a problem : either your vectors are relatively small, end then won't need resizing and will keep their first allocated space, either they are big, and then memory accesses are somewhat contiguous. The problem may be if you need to resize them often, problem which is prevented by the `reserve` method without wasting much memory :) – rems4e Jul 08 '13 at 12:36
0

Perhaps the resize function will help you. See here for details.

Community
  • 1
  • 1
levengli
  • 1,091
  • 7
  • 18
  • In the example, the number of entries in the row is a priori known. In my case, I don't know how many entries each nested vector will contain or when they will be added. Think of it as a pidgeon-hole problem. Are you suggesting I resize every time I add an entry to the nested vector? – Plamen Jul 07 '13 at 12:24
  • It's not a good solution, however the tradeoff relies on parameters that I don't have (such as the number and frequency of changes). You could probably always resize with a increment of more than 1 (hysteresis of sorts) – levengli Jul 07 '13 at 16:25