10

I'm coding a complex tree data structure, which stores lots of pointers. The pointers themselves occupy lots of space, and this is what I'm expecting to save.

So I'm here to ask whether there are examples on this. E.g.: For 64-bit data type, can I use a 32-bit or less pointer if the data it's pointing to is sure to be continuous?

I found a paper called Transparent Pointer Compression for Linked Data Structures, but I thought there could be a much simpler solution.

Update:

It is an octree. A paper here about this on GPU is GigaVoxels: A Voxel-Based Rendering Pipeline For Efficient Exploration Of Large And Detailed Scenes, they use 15-bit pointers on GPU

phuclv
  • 37,963
  • 15
  • 156
  • 475
tomriddle_1234
  • 3,145
  • 6
  • 41
  • 71
  • 1
    Is it a tree or a graph ? *Complex tree* does not make much sense to me as *by definition* a node of a tree has a single parent and a list of children it *owns*; ruling out the possibility of cycles or shared children. It would help if you showed us an example of a *Node* in your structure; to help jumpstart the discussion. – Matthieu M. Jan 23 '13 at 08:02
  • @Matthieu M. , this is an octree. A paper here about this on GPU is GigaVoxels : A Voxel-Based Rendering Pipeline For Efficient Exploration Of Large And Detailed Scenes, http://maverick.inria.fr/Publications/2011/Cra11/, they use 15bit for pointers on GPU – tomriddle_1234 Jan 24 '13 at 23:20

5 Answers5

5

Instead of using pointers, use an index into an array. The index can be a short if the array is less than 65536 in length, or int32_t if it's less than 2147483648.

An arbitrary pointer can really be anywhere in memory, so there's no way to shorten it by more than a couple of bits.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • 1
    theoretically if I can allocate a block of memory first and knowing the first pointer address, it's possible to calculate pointer in realtime. However I'm just expecting a systematic solution to save memory that wasted on pointers. Using array is good, but sometime I still need the pointer – tomriddle_1234 Jan 23 '13 at 04:10
  • You could use signed indices: positive values are indices inside the array of your structs while negative ones are negated indices into pointers array. This involves an extra computational expenses, but you always should decide what you are optimizing - speed or memory. – Serge Jan 23 '13 at 04:18
  • @Serge: if you'd need such flexibility, you're probably better off with indexes into a `std::deque`. – MSalters Jan 23 '13 at 10:41
  • @MSalters yes, my comment was related to the OP's requirement to have the possibility to store pointers some times. There is a number of ways to implement this, it was just a hint. – Serge Jan 23 '13 at 16:06
1

In some cases, you could simply use an array to hold the nodes. A binary tree node at arr[i] would have children from arr[(i*2)+1] to arr[(i+1)*2]. Its parent would be at arr[(i-1)/2], if i != 0. And to figure the real pointer address, of course, you could say &arr[i]. It's actually a rather common implementation for trees that are full by specification, like the tree used for a heap.

In order for a node to know for itself how to find its children, though, you'd likely need either an index or a pointer to the container. (And even then, with only one of the two pieces, you're having to do a bit of hoop-jumping; you really need both pieces in order to do things easily. But having to calculate stuff rather than remembering it, is kinda the price you pay when you're trying not to remember much.) In order to keep stuff reasonably space-efficient, you'd have to dumb down the nodes; make them basically structs, or even just the values, and let the tree class do all the node-finding stuff. It'd just hand out pointers to nodes, and that pointer would be all the container needs to figure out the node's index (and, thus, where its children will be). You'd also have to pass both a tree pointer and a node pointer to any function that wants to traverse the tree.

Note, though, this won't save much space unless your trees are consistently near full (that is, unless most/all of your leaf nodes are at the end). For every leaf node that's not at the bottom of the tree (where the top is the root), you waste something like ((node size) * (tree size / i)) bytes.

If you can't count on the tree being full, or the nodes being in a certain restricted space, then there's not a whole lot to optimize here. The whole point of trees is nodes have pointers to their children; you can fake it with an array, but it must be easy to somehow find a node's children in order to make a tree worthwhile.

cHao
  • 84,970
  • 20
  • 145
  • 172
1

One option is to write a custom allocator to allocate big blocks of contiguous memory, and then store your nodes contiguously in there. Each of your nodes can then be referenced by a simple index that can be mapped back to memory using simple pointer arithmetic (e.g.: node_ptr = mem_block_ptr + node_index).

Soon you realise that having multiple of these memory blocks means that you no longer knows in which of them a specific node resides. This is where partitioning comes into scene. You can opt for horizontal and/or vertical partitioning. Both considerably increase the level of complexity, and both have pros and cons (see [1] and [2]).

The key thing here is to ensure that the data is split up in a predictable manner.

References:

  1. Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes
  2. 37signals - Mr. Moore gets to punt on sharding
jweyrich
  • 31,198
  • 5
  • 66
  • 97
1

If the utilization of pointers takes a lot of space:

use an array of pointers, and replaces pointers with indexes in that array. That adds just another indirection. With less than 64k pointers, you need a [short] array (Linux)

Simple implementation

#define   MAX_PTR  60000

void *aptr[MAX_PTR];
short nb = 0;

short ptr2index(void *ptr) {
  aptr[nb] = ptr;
  return (short)nb++;
}

void *index2ptr(short index) {
  return aptr[index];
}

... utilization ...

... short next; // in Class

Class *c = new Class();
mystruct->next = ptr2index((void *)c);

...

Class *x = (Class *)index2ptr(otherstruct->next);
Déjà vu
  • 28,223
  • 6
  • 72
  • 100
0

A very simple way of dealing with your issue is simply to use less pointers (seems silly right) ?

Compare the two following approaches:

template <typename T>
struct OctreeNaiveNode {
    T value;
    Point center;
    OctreeNaiveNode* parent;
    std::unique_ptr<OctreeNaiveNode> children[8];
}; // struct OctreeNaiveNode

// sizeof(OctreeNaiveNode) >= sizeof(T) + sizeof(Point) + 9 * sizeof(void*)

template <typename T>
struct OctreeNode {
    T value;
    Point center;
    std::unique_ptr<OctreeNode[]> children; // allocate for 8 only when necessary
}; // struct OctreeNode

// sizeof(OctreeNode) >= sizeof(T) + sizeof(Point) + sizeof(void*)

How does it work:

  • A parent pointer is only necessary for simple iterators, if you have far more less iterators than nodes, then it is more economic to have deep iterators: ie iterators that conserve a stack of parents up to the root. Note that in RB-tree it does not work so well (balancing), but in octree it should be better because the partitioning is fixed.
  • A single children pointer: instead of having an array of pointers to children, build a pointer to an array of children. Not only it means 1 dynamic allocation instead of 8 (less heap fragmentation/overhead), but it also means 1 pointer instead of 8 within your node.

Overhead:

  • Point = std::tuple<float,float,float> => sizeof(T) + sizeof(Point) >= 64 => +100%
  • Point = std::tuple<double,double,double> => sizeof(T) + sizeof(Point) >= 256 => +25%

So, rather than delving into compression pointers strategies, I advise you just rework your datastructures to have less pointers/indirection in the first place.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722