Managing large spatial data set with attributes in C++

Question

I have a data set with about 700 000 entries, and each entry is a set of 3D coordinates with attributes such as name, timestamp, ID, and so on.

Right now I'm just reading the coordinates and render them as points in OpenGL. However I want to associate each point with its corresponding attributes and I want to be able to sort and pick them during runtime based on their attributes. How would I go about to achieve this in an efficient manner?

I know I can put I can put the data in a struct and use stl sort for sorting, but is that a good design choice or is there a more efficient/elegant way of handling the problem?

you can sort those entries based on the attributes. You have to sort anyway. — taocp, Apr 10 '13 at 17:59
look into octrees, rtrees, bsp, kdtrees which are good containers to implement region based queries. — didierc, Apr 10 '13 at 18:14
@ael since you want to be able to sort those entries, you cannot avoid sorint anyway, you can use certain index structures such as R-Trees. — taocp, Apr 10 '13 at 18:17
@didierc It's not region based queries that I'm asking about. I want to be able to sort and pick a point or a set of points based on associated attributes, such as a string. So for example: pick all points with the name "junk". Or am I missing something? — ael, Apr 10 '13 at 18:23

score 2 · Answer 1 · answered Apr 10 '13 at 18:34

The way I tend to look at these design choices is to first use one of the standard library containers (btw, if you need to "just" do lookup you don't necessarily have to sort, but you need a container that allows lookup), then check if this an "efficient enough" solution for the problem.

You can usually come up with a custom solution that is more efficient and maybe more elegant but you tend to run into two issues with that:

1) You end up having to implement some type of a container, which will cost you time both in implementation and debugging compared to a well understood and tested container that is already out there. Most of the time you're better off trying to solve the problem at hand rather than make it bigger by adding more code.

2) If someone else will have to maintain your code at some point, chances are they are familiar with standard library components both from a design and implementation perspective, but they won't be familiar with your custom container, thus increasing the learning curve.

didierc · Answer 2 · 2013-04-11T08:41:34.333

If you consider each attribute of your point class as a component of a vector, then your selection process is a region query. Your example of a string attribute being equal to something means that the region is actually a line in your data space. However, there won't be any sorting made on other attributes within that selection, you will have to implement it by yourself, but it should be relatively straightforward for octrees, which partition data in ordered regions.

As advocated in another answer, try existing standard solutions first. If you can find an of the shelf implementation of one of these data structures:

R-tree
KD tree
BSP
Octree, or more likely, a n dimensional version of the quadtree or octree principle (I will use the term octree herein to denote the general data structure)

then go for it. These are the data structures I recommend for spatial data management.

You could also use an embedded RDBMS capable of working with spatial data (they usually implement R-tree for spatial indexing), but it may not be interesting if your dataset isn't dynamic.

If your dataset falls within the 10000 entries range, then by today standards it isn't that large, so using simpler structures should suffice. In that perimeter, I would go first for a simple std::vector, and use std::sort and std::find to filter the data in smaller set and sort it afterward.

I would probably try an ordered set or map on the most queried attribute in a second attempt, then do some benchmarks to pick the more performing solution.

For a more efficient one dimensional indexing algorithm (in essence, that`s what sets and maps are), you might want to try B-trees: there's C++ implementation available from google.

My third attempt would go toward an OpenCL solution (although if you are doing heavy OpenGL rendering, you might prefer doing the work on the CPU instead, but that depends on your framerate needs).

If your dataset is much larger, as it seems to be, then consider one of the more complex solutions I listed initially.

At any rate, without more details about your dataset and how you plan to use it, it will be difficult to provide a good solution, so the only real advice we can give is: try everthing you can and benchmark.

also, consider asking your question on programmers.stackexchange.com or perhaps gamedev.stackexchange.com. — didierc, Apr 10 '13 at 18:59
Thanks for the elaboration. Also, it isn't 7000 points, it's 700 000 points. My mistake, and I've edited it. — ael, Apr 10 '13 at 20:38

score 0 · Answer 3 · answered Apr 11 '13 at 09:36

0

If you're dealing with point clouds, take a look at PCL, it could save you a lot of time and effort without having to dig into the intricacies of spatial indexing yourself. It also includes visualisation.

answered Apr 11 '13 at 09:36

SmacL

22,555
12
95
149

Managing large spatial data set with attributes in C++

3 Answers3