4

I have an application which is used for displaying and modifying huge volumes of point cloud data from lidar files (up to few gigabytes each, sometimes loaded in simultaneously). In the app the user is able to view a 2D image of loaded points (from the top) and select a profile to view in another window (from the side). Again this involves millions of points and they are displayed using OpenGL.

To handle the data there is also a quadtree library, which works, but is extremely slow. It has been used for some time, but recently the lidar point format changed and the LidarPoint object needed a number of attributes (class members) added, which cause it to grow in size in turn affecting the performance to almost unusable level (think 5 minutes to load a single 2GB file).

The quadtree currently consist of pointers to PointBucket objects which are simply arrays of LidarPoint objects with specified capacity and defined boundaries (for spatial queries). If the bucket capacity is exceeded it splits into four buckets. There is also kind of a caching system in place which causes point buckets to get dumped to disk when the point data is taking too much memory. These are then loaded back into memory if needed. Finally every PointBucket contains subbuckets/resolution levels which hold every n-th point of the original data and are used when displaying the data depending on the zoom level. That is because displaying few million points at once, while that level of detail is not necessary, is just extremely slow.

I hope you can get a picture from this. If not please ask and I can provide some more details or upload more code. For example here is the current (and slow) insert method:

// Insert in QuadTree
bool QuadtreeNode::insert(LidarPoint newPoint)
{
   // if the point dosen't belong in this subset of the tree return false
   if (newPoint.getX() < minX_ || newPoint.getX() > maxX_ || 
       newPoint.getY() < minY_ || newPoint.getY() > maxY_)
   {
      return false;
   }
   else
   {
      // if the node has overflowed and is a leaf
      if ((numberOfPoints_ + 1) > capacity_ && leaf_ == true)
      {
         splitNode();

         // insert the new point that caused the overflow
         if (a_->insert(newPoint))
         {
            return true;
         }
         if (b_->insert(newPoint))
         {
            return true;
         }
         if (c_->insert(newPoint))
         {
            return true;
         }
         if (d_->insert(newPoint))
         {
            return true;
         }
         throw OutOfBoundsException("failed to insert new point into any \
                                     of the four child nodes, big problem");
      }

      // if the node falls within the boundary but this node not a leaf
      if (leaf_ == false)
      {
         return false;
      }
      // if the node falls within the boundary and will not cause an overflow
      else
      {
         // insert new point
         if (bucket_ == NULL)
         {
            bucket_ = new PointBucket(capacity_, minX_, minY_, maxX_, maxY_, 
                                      MCP_, instanceDirectory_, resolutionBase_, 
                                      numberOfResolutionLevels_);
         }
         bucket_->setPoint(newPoint);         
         numberOfPoints_++;
         return true;
      }
   }
}

// Insert in PointBucket (quadtree holds pointers to PointBuckets which hold the points)
void PointBucket::setPoint(LidarPoint& newPoint)
{    
   //for each sub bucket
   for (int k = 0; k < numberOfResolutionLevels_; ++k)
   {
      // check if the point falls into this subbucket (always falls into the big one)
      if (((numberOfPoints_[0] + 1) % int(pow(resolutionBase_, k)) == 0))
      {
         if (!incache_[k])
            cache(true, k);

         // Update max/min intensity/Z values for the bucket.
         if (newPoint.getIntensity() > maxIntensity_)
            maxIntensity_ = newPoint.getIntensity();
         else if (newPoint.getIntensity() < minIntensity_)
            minIntensity_ = newPoint.getIntensity();

         if (newPoint.getZ() > maxZ_)
            maxZ_ = newPoint.getZ();
         else if (newPoint.getZ() < minZ_)
            minZ_ = newPoint.getZ();

         points_[k][numberOfPoints_[k]] = newPoint;
         numberOfPoints_[k]++;
      }
   }
}

Now my question is if you can think of a way to improve this design? What are some general strategies when dealing with huge amounts of data that doesn't fit into memory? How can I make the quadtree more efficient? Is there a way to speed up rendering of points?

genpfault
  • 51,148
  • 11
  • 85
  • 139
jaho
  • 4,852
  • 6
  • 40
  • 66
  • 3
    http://codereview.stackexchange.com – Lightness Races in Orbit Mar 20 '12 at 12:36
  • do you really need quad tree? don't you think about another data structure like spatial hashes? – innochenti Mar 20 '12 at 12:46
  • Would that be faster for inserting, retrieving and modifying the data? I need to be able to randomly select a bouding box and get all the points that fall within this box. I thought quadtree would be best for it. – jaho Mar 20 '12 at 13:07
  • 1
    Best case for a spatial hash is when all objects are more or less evenly spread across the system. Worst case is when they're grouped together somehow in or around one location. So it depends on your use case really. I asked a question about this some time ago: http://stackoverflow.com/questions/7107231/best-algorithm-for-efficient-collision-detection-between-objects – Robinson Mar 20 '12 at 13:35

1 Answers1

3

Now my question is if you can think of a way to improve this design?

Yes: Don't store the objects itself in the quadtree. Put them into a flat structure (array, linked list, etc.) and have the Quadtree just keep a pointer to the actual objects. If the quadtree has a certain depth (on all nodes), you could flatten it as well.

datenwolf
  • 159,371
  • 13
  • 185
  • 298
  • The points are held in PointBuckets. Quadtree Nodes only hold pointers to these. They can't (I think) be held in a single collection, because they don't fit in memory. – jaho Mar 20 '12 at 12:47
  • 2
    @Marian: Well, if you don't hold all points im memory, then your program is IO bound. There's really very little to speed that up using in memory data structures. You need to work on your storage structure instead. Of course you can use quadtree like structures for that as well (surprisingly this works very well, if you use the filesystem, for this, i.e. directories as branches, files as leafs). – datenwolf Mar 20 '12 at 13:35