2


I have to search a list of structs for every item whose XZPos is closer to Vector2 (or PointF) P. The list is ordered by XZPos' x and y. It'll look something like this:

Item 1 (XZPos: 0,0)
Item 2 (XZPos: 0,1)
Item 3 (XZPos: 0,2)
...
Item 12 (XZPos: 1,0)
Item 13 (XZPos: 1,1)
Item 14 (XZPos: 1,2)
...
2.249.984 elements later
...

Now I have a point P (4,4) and I want a list of structs in the above list of every item closer to P than 5,66f. My algorithm searches every item in the list like this:

        List<Node> res = new List<Node>();
        for (int i = 0; i < Map.Length; i++)
        {
            Vector2 xzpos = new Vector2(Map[i].X, Map[i].Z);//Map is the list containing 2.250.000 elements
            //neighbourlength = 5,66f in our example
            if ((xzpos - pos).Length() <= neighbourlength && xzpos != pos)//looking for every item except the item which is P itself, therefore checking whether "xzpos != pos"
            {
                res.Add(new Node() { XZPos = xzpos, /*Other fields*/ }); 
            }
        }
        return res.ToArray();

My problem is that it takes way too long to complete, and now I'm looking for a way to find the fields I'm looking for without searching the entire list. 22 seconds for a search is TOO LONG. If someone could help me get it to 1 or 2 seconds that would be very nice.

Thanks for your help, Alex

alex
  • 1,228
  • 1
  • 16
  • 38

4 Answers4

2

Your list is sorted, and you can use that to shrink the problem space. Instead of searching the whole list, search the subset of the list that spans x values within 5,66f, since anything that is farther than the maximum on one coordinate will be farther than you want no matter what the other coordinate is. Then if you store the start positions of each value in the list (i.e. in your example, "0" elements start at 1, and "1" elements start at 12), you can quickly get to the part of the list you care about. So instead of iterating through items 0 to 2 million, you could instead be iterating through items 175839 to 226835.

Example:

The List
1: (0,1)
2: (0,2)
3: (1,0)
4: (1,1)
5: (2,1)
6: (2,3)
7: (3,1)
8: (3,5)
9: (4,2)
10: (4,5)
11: (5,1)
12: (5,2)
13: (6,1)
14: (6,2)
15: (6,3)
16: (7,1) 
17: (7,2)   

Start Locations
(0,1)
(1,3)
(2,5)
(3,7)
(4,9)
(5,11)
(6,13)
(7,16)

If I have a point (3,5) and I want to search the list for points within 2 of it, I only need to iterate through the points where x is between 1 and 5. So I look at my start locations, and see that 1 starts at position 3 in the list, and 5 ends at position (13 - 1). So instead of iterating from 1 to 17, I only need to iterate from 3 to 12. If the range of values in your data is large but the distance to check is short, this will reduce the number of entries you need to iterate across greatly.

Brian Schroth
  • 2,447
  • 1
  • 15
  • 26
1

To the below solution there's a few assumptions. These assumptions are not stated in your question an the solution might need adjustment. If the assumptions hold this solution is extremely fast but requires you to keep the data in a different structure. However if you can change the structure then this will have a look of for each valid point in (almost) constant time. That is the time required to find M valid points in a set containing M point in total depends only on N.

The assumptions are:

  • Only positive integers are used for values of X and Y
  • No duplicate points in the list

`private IEnumerable> GetPointsByDistance(int xOrigin, int yOrigin, double maxDistance){

                    var maxDist = (int) Math.Floor(maxDistance);
                    //Find the lowest possible value for X
                    var minX = Math.Min(0, xOrigin - maxDist);
                    //Find the highest possible value for X
                    var maxX = Math.Min(MaxX, xOrigin + maxDist);
                    for (var x = minX; x <= maxX; ++x){
                        //Get the possible values for Y with the current X
                        var ys = points[x];
                        if (ys.Length > 0){
                            //Calculate the max delta for Y
                            var maxYDist =(int)Math.Floor(Math.Sqrt(maxDistance*maxDistance - x*x));
                            //Find the lowest possible Y for the current X
                            var minY = Math.Min(0, yOrigin - maxYDist);
                            //Find the highest possible Y for the current X
                            var maxY = Math.Min(ys.Length, yOrigin + maxYDist);
                            for (var y = minY; y <= maxY; ++y){
                                //The value in the array will be true if a point with the combination (x,y,) exists
                                if (ys[y]){
                                    yield return new KeyValuePair<int, int>(x, y);
                                }
                            }
                        }
                    }
                }`

The internal representation of the point in this code is bool[][]. The value will be true if the indices of the two arrays is a point included in the set. E.g. points[0][1] will be true for if the set was the six points you've included in the post and points[0][3] would be false.

If the assumptions are not met the same idea can still be used but will need tweeking. If you post what the assumptions should be I'll update the answer to the question.

An assumption that does not have to be met is that the points are close to sequential. If they are not this will be rather greedy when it comes to memory. Changes can be made if the points are not (close) to sequential and again post the invariants if my assumptions are wrong and I'll update.

Rune FS
  • 21,497
  • 7
  • 62
  • 96
0

I have no idea what your original code is doing. I see you've defined x and y and never use them and you refer to pos but it's not defined. So, instead I'm going to take the verbal description of the problem your trying to solve

Now I have a point P (4,4) and I want a list of structs in the above list of every item closer to P than 5,66f.

and solve it. This is easy:

var nearbyNeighbors = Map.Where(
    point => (point.X - 4) * (point.X - 4) +
             (point.Z - 4) * (point.Z - 4) <
             5.66f * 5.66f
    );

foreach(var nearbyNeighbor in nearbyNeighbors) {
    // do something with nearbyNeighbor
}

Here, I am assuming that you are using the standard Euclidean norm.

To utilize the fact that the collection is lexicographically sorted:

Binary search until |point.X - 4| >= 5.66. This splits the list in two contiguous sublists. Throw out the sublist with |point.X - 4| >= 5.66. From the remaining sublist, binary search until |point.Y - 4| >= 5.66 This splits the remaining sublist into two contiguous sublists. Throw out the sublist with |point.Y - 4| >= 5.66. Then search linearly through the remaining sublist.

jason
  • 236,483
  • 35
  • 423
  • 525
  • but here again, I'm searching the entire list containing 2.250.000 items, which takes to long! – alex May 05 '11 at 19:36
  • What do you expect? You have to inspect every item at least once. This problem is linear in the length of the list. Period. – jason May 05 '11 at 19:39
  • the list is sorted. is there no way to start the search in the middle of the list? – alex May 05 '11 at 19:40
  • Sorry, I missed that it was lexicographically sorted. Just binary search until `|point.X - 4| >= 5.66`. This splits the list in two contiguous sublists. Throw out the sublist with `|point.X - 4| >= 5.66`. From the remaining sublist, binary search until `|point.Y - 4| >= 5.66` This splits the remaining sublist into two contiguous sublists. Throw out the sublist with `|point.Y - 4| >= 5.66`. Then search linearly through the remaining sublist. – jason May 05 '11 at 19:48
  • binary search will be as slow as iterating I think. I will try brian schroths solution approach and yours and then I'll post a camparison of performances. – alex May 05 '11 at 20:01
  • @alex the worst case of two binary search a list of the size you state is 2*22 comparisons so iterating the entire list Will be a lot slower in almost all cases. That Said I think the second search should be on x-4 >= 5.66 since the list is not sorted on Y – Rune FS May 05 '11 at 20:30
0

This problem is related to this:

http://en.wikipedia.org/wiki/Nearest_neighbor_search

Take a look for sub-linear solutions.

Also take a look at this question

which data structure is appropriate to query "all points within distance d from point p"

Community
  • 1
  • 1
dfb
  • 13,133
  • 2
  • 31
  • 52
  • No, this is a completely different problem. – jason May 05 '11 at 19:37
  • @Jason - Completely different? If you have the nearest neighbors you can test the distance. – dfb May 05 '11 at 19:39
  • do you have any approach to the problem which would help me? – alex May 05 '11 at 19:41
  • @alex - the space partitioning approach is the one I was originally thinking of. I'll add some ideas inline – dfb May 05 '11 at 19:42
  • @spinning_plate okay I'll test the performace difference. – alex May 05 '11 at 19:43
  • The given problem: You have one point. You are looking for all the points in a collection of points in a given neighborhood of that point. The nearest-neighbor problem: You have a collection of points. You are looking for the point in that collection nearest to a given point. – jason May 05 '11 at 19:46
  • @Jason - Yes, fine. But the techniques described in the Wiki page are applicable to the problem. It's not at all clear to me at all that it's a linear search 'period' if preprocessing of the collection is allowed. – dfb May 05 '11 at 19:48
  • @Jason thats exactly what I want! – alex May 05 '11 at 19:49