I have some 2D data which contains edges which were rasterized into pixels. I want to implement an efficient data structure which returns all edge pixels which lie in a non-axis-aligned 2D triangle.
The image shows a visualization of the problem where white denotes the rasterized edges, and red visualizes the query triangle. The result would be all white pixels which lie on the boundary or inside the red triangle.
When further looking at the image, one notices that we have sparse boolean data, meaning that if we denote black pixels with a 0 and white pixels with a 1, that the number of 1s in the data is much lower than the number of 0s. Therefore, rasterizing the red triangle and checking for each point on it's inside whether it is white or black is not the most efficient approach.
Besides the sparseness of the data; since the white pixels origin from edges, it is in their nature to be connected together. However, at junctions with other lines, they have more than two neighbors. Pixels which are at a junction should only be returned once.
The data must be processed in realtime, but with no GPU assistance. There will be multiple queries for different triangle contents, and after each one, points may be removed from the data structure. However, new points won't be inserted anymore after the initial filling of the data structure.
The query triangles are already known when the rasterized edges arrive.
There are more query triangles than data edges.
There are many spatial data structures available. However I'm wondering, which one is the best one for my problem. I'm willing to implement a highly optimized data structure to solve this problem, as it will be a core element of the project. Therefore, also mixes or abbreviations of data structures are welcome!
R-trees seem to be the best data structure which I found for this problem until now as they provide support for rectangle-based queries. I would check for all white pixels within an AABB of the query triangle, then would check for each returned pixel if it lies within the query rectangle.
However, I'm not sure how well R-trees will behave since edge-based data will not be easily groupable into rectangles, as the points are clumped together on narrow lines and not pread out.
I'm alo not sure if it would make sense to pre-build the structure of the R-tree using information about the query triangles which will be made as soon as the structure is filled (as mentioned before, the query triangles are already known when the data arrives).
Reversing the problem seems also to be a valid solution, where I use a 2-dimensional interval tree to get for each white pixel a list of all triangles which contain it. Then, it can already be stored within all those result sets and be returned instantly when the query arrives. However, I'm not sure how this performs a the number of triangles is higher than the number of edges, but still lower than the number of white pixels (as an edge is mostly split up into ~20-50 pixels).
A data structure which would exploit that white pixels have most often white pixels as neighbors would seem to be most efficient. However, I could not find anything about such a thing until now.