0

i am trying to make a $lookup over 2 collections where the 1st collection has single points, in the 2nd collection are polygons... now i want to query all points within the polygon and build an avg value from the 1st collection.

my actual aggregation:

    'aggregation': {
        'pipeline': [
            # must match the field location from my first collection
            { "$match": {
                 "location" : "$loc" 
            } },

            # lookup to join location from 1st collection and geometry from 2nd collection
            { "$lookup": { 
                 "from": "polygons", 
                 "localField": "location", 
                 "foreignField": "geometry", 
                 "as": "join" } },

            # output grouped
            { "$group": { 
                "_id": "Average", 
                "Geometry": "$join.geometry" ,
                "AvgVal": { "$avg": "$myVal" }  , 
            "count": {"$sum": 1} } },
        ]
    }

i hope anybody is able to understand what i mean ;-)

edit: document 1:

{ id: "ABC", location: {type: "point", coordinates: [0,1]}, myVal: 1 },
{ id: "DEF", location: {type: "point", coordinates: [2,3]}, myVal: 2 }
{ id: "GHI", location: {type: "point", coordinates: [9,8]}, myVal: 3 }
{ id: "JKL", location: {type: "point", coordinates: [7,6]}, myVal: 4 }
{ id: "MNO", location: {type: "point", coordinates: [5,4]}, myVal: 5 }

document 2:

{ id: "Vienna Part1", geometry: {type: "polygon", coordinates: [[[0,1],[1,2],[2,3],[0,1]]] } },
{ id: "Vienna Part2", geometry: {type: "polygon", coordinates: [[[9,8],[7,6],[5,4],[9,8]]] } },

now i send a query with a polygon(map on my screen): $geoIntersect polygon.. [0,1]...[9,8]

expected:

{ id: "Vienna Part1", AvgVal: 1.5, geometry: {type: "polygon", coordinates: [[[0,1],[1,2],[2,3],[0,1]]] }
{ id: "Vienna Part2", AvgVal: 4, geometry: {type: "polygon", coordinates: [[[9,8],[7,6],[5,4],[9,8]]] }

lg Harald

Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
Harald Wiesinger
  • 651
  • 3
  • 11
  • 23
  • Could you post both documents? And expected result? – profesor79 Mar 22 '16 at 22:44
  • Possible duplicate of [$lookup on ObjectId's in an array](http://stackoverflow.com/questions/34967482/lookup-on-objectids-in-an-array) – Blakes Seven Mar 22 '16 at 23:15
  • sure that it works if the joined keyfield is the geospatial index field? i dont query for the array, i query with $geoIntersects – Harald Wiesinger Mar 22 '16 at 23:21
  • It means you need to `$unwind` the array "twice". See the wrapping `[[ ]]`. Then `[0,1]` etc will match the value in the corresponding document. Nothing about this has anything to do with "geospatial" anyway. They are just "arrays", regardless of what the data contained is used for. Only if you wanted a `$geoWithin` ( and you cannot use `$near` or similar other than in the "first" pipline stage ) would there be anything "geospatial" about this. And that part is not related to the `$lookup` portion anyway. – Blakes Seven Mar 23 '16 at 00:11

1 Answers1

2

You don't seem to be understanding that the concept of $lookup is that for a field to "match" on lookup it must have the same data. Since the data contained in one document is "double nested" in an array in "Polygon" notation, it is necessary to $unwind the array content "twice" in order to get the "match".

Setting this up as a whole example, let's create one collection with your "point" documents:

db.geo1.insertMany([
  { _id: "ABC", location: {type: "point", coordinates: [0,1]}, myVal: 1 },
  { _id: "DEF", location: {type: "point", coordinates: [2,3]}, myVal: 2 },
  { _id: "GHI", location: {type: "point", coordinates: [9,8]}, myVal: 3 },
  { _id: "JKL", location: {type: "point", coordinates: [7,6]}, myVal: 4 },
  { _id: "MNO", location: {type: "point", coordinates: [5,4]}, myVal: 5 }
])

And another including the "polygon" documents, deliberately including one that will not match:

db.geo2.insertMany([
  { _id: "Vienna Part1", geometry: {type: "polygon", coordinates: [[[0,1],[1,2],[2,3],[0,1]]] } },
  { _id: "Vienna Part2", geometry: {type: "polygon", coordinates: [[[9,8],[7,6],[5,4],[9,8]]] } },
  { _id: "Vienna Part3", geometry: {type: "polygon", coordinates: [[[10,1],[10,3],[3,10],[10,1]]] } }
])

In order to inspect the geo2 collection for items in geo1 that "intersect" via the same point coordinates then you must do:

db.geo2.aggregate([
  { "$unwind": "$geometry.coordinates" },
  { "$unwind": "$geometry.coordinates" },
  { "$lookup": {
    "from": "geo1",
    "localField": "geometry.coordinates",
    "foreignField": "location.coordinates",
    "as": "geo1"
  }},
  { "$group": {
    "_id": "$_id",
    "coordinates": {
      "$push": "$geometry.coordinates"
    },
    "matches": {
      "$push": { "$ne": [ "$geo1", [] ] }
    }
  }},
  { "$redact": {
    "$cond": {
      "if": { "$anyElementTrue": "$matches" },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$project" : {
    "geometry": {
      "type": { "$literal": "polygon" },
      "coordinates": ["$coordinates"]
    }
  }},
  { "$sort": { "_id": 1 } }
])

So the first thing is the double $unwind to just get the "point" data out of the arrays. Then you can do the $lookup.

{ "_id" : "Vienna Part1", "geometry" : { "type" : "polygon", "coordinates" : [ 0, 1 ] }, "geo1" : [ { "_id" : "ABC", "location" : { "type" : "point", "coordinates" : [ 0, 1 ] }, "myVal" : 1 } ] }
{ "_id" : "Vienna Part1", "geometry" : { "type" : "polygon", "coordinates" : [ 1, 2 ] }, "geo1" : [ ] }
{ "_id" : "Vienna Part1", "geometry" : { "type" : "polygon", "coordinates" : [ 2, 3 ] }, "geo1" : [ { "_id" : "DEF", "location" : { "type" : "point", "coordinates" : [ 2, 3 ] }, "myVal" : 2 } ] }
{ "_id" : "Vienna Part1", "geometry" : { "type" : "polygon", "coordinates" : [ 0, 1 ] }, "geo1" : [ { "_id" : "ABC", "location" : { "type" : "point", "coordinates" : [ 0, 1 ] }, "myVal" : 1 } ] }
{ "_id" : "Vienna Part2", "geometry" : { "type" : "polygon", "coordinates" : [ 9, 8 ] }, "geo1" : [ { "_id" : "GHI", "location" : { "type" : "point", "coordinates" : [ 9, 8 ] }, "myVal" : 3 } ] }
{ "_id" : "Vienna Part2", "geometry" : { "type" : "polygon", "coordinates" : [ 7, 6 ] }, "geo1" : [ { "_id" : "JKL", "location" : { "type" : "point", "coordinates" : [ 7, 6 ] }, "myVal" : 4 } ] }
{ "_id" : "Vienna Part2", "geometry" : { "type" : "polygon", "coordinates" : [ 5, 4 ] }, "geo1" : [ { "_id" : "MNO", "location" : { "type" : "point", "coordinates" : [ 5, 4 ] }, "myVal" : 5 } ] }
{ "_id" : "Vienna Part2", "geometry" : { "type" : "polygon", "coordinates" : [ 9, 8 ] }, "geo1" : [ { "_id" : "GHI", "location" : { "type" : "point", "coordinates" : [ 9, 8 ] }, "myVal" : 3 } ] }
{ "_id" : "Vienna Part3", "geometry" : { "type" : "polygon", "coordinates" : [ 10, 1 ] }, "geo1" : [ ] }
{ "_id" : "Vienna Part3", "geometry" : { "type" : "polygon", "coordinates" : [ 10, 3 ] }, "geo1" : [ ] }
{ "_id" : "Vienna Part3", "geometry" : { "type" : "polygon", "coordinates" : [ 3, 10 ] }, "geo1" : [ ] }
{ "_id" : "Vienna Part3", "geometry" : { "type" : "polygon", "coordinates" : [ 10, 1 ] }, "geo1" : [ ] }

The resulting data is showing the "matched" element(s) from the geo1 collection within the new array element "geo1" in the documents. This is a kind of "left join", so there is either an array with any match(es) or an empty array.

Your next point is we want to $group back to something more like the original document form, and here we test to see if the result of $lookup produced an empty array or not. This means that the corresponding document was "not found" in the other collection:

{ 
    "_id" : "Vienna Part3",
    "coordinates" : [ [ 10, 1 ], [ 10, 3 ], [ 3, 10 ], [ 10, 1 ] ],
    "matches" : [ false, false, false, false ]
}
{ 
    "_id" : "Vienna Part2",
    "coordinates" : [ [ 9, 8 ], [ 7, 6 ], [ 5, 4 ], [ 9, 8 ] ],
    "matches" : [ true, true, true, true ]
}
{ 
    "_id" : "Vienna Part1", 
    "coordinates" : [ [ 0, 1 ], [ 1, 2 ], [ 2, 3 ], [ 0, 1 ] ],
    "matches" : [ true, false, true, true ]
}

What you should see here is that one of the documents has a resulting "matches" array where every element is false. This is important for the following $redact condition, which does a test on that array with $anyElementTrue.

So where the array is seen in the document where all elements are false, this means that there was "no intersection" for this particular set of points. As such, that document will be discarded via $$PRUNE where the other documents containing "at least one" true would be kept.

The only thing remaining is a little cosmetic tranformation to give you back the desired result:

{ 
    "_id" : "Vienna Part1", 
    "geometry": {
        "type": "polygon",
        "coordinates" : [[ [ 0, 1 ], [ 1, 2 ], [ 2, 3 ], [ 0, 1 ] ]]
    }
}
{ 
    "_id" : "Vienna Part2",
    "geometry": {
        "type": "polygon",
        "coordinates" : [[ [ 9, 8 ], [ 7, 6 ], [ 5, 4 ], [ 9, 8 ] ]]
    }
}

Of course, different geometry types complicates the process a little more, but that is still the basic process.

  1. In order to $lookup you must transform the data into a form that would match the target collection field being inspected.

  2. When looking at results in a "set", you can test each element for a logical result and then evaluate with $anyElementTrue. This will tell you if something matched without destroying the individual elements.

So as well as the first condition, the second condition makes sure that your "Vienna Part1" document which contains points that do not match "all" of the documents in the other collection is also a match because "at least one" element had a true evaluation for a match.

Those are the rules. So I hope you understand it better now.

Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
  • Hi Blakes Seven, this is a really cool explanation for how to use the aggregations, and i think its a bit clearer now for me, but this is not exactly i want to do. The points in the polygon dont hit the coordinates of the polygon exactly they are just within the polygon. so the geospatial part ($geoWithin) is necessary to calculate the points within the polygon. i copied your aggregation definition and adapted it to my real collection.. up to the $machtes part.. there is not a single point hitting any polygon... but i am sure the points are within the polygons – Harald Wiesinger Mar 23 '16 at 09:48
  • @HaraldWiesinger Hence the explanation. Your question was asking about `$lookup` and is not really about anything else. This I even told you in comment earlier. You cannot `$geoWithin` by using values from elsewhere in the document. You can "iterate" one collection and use those values to "query" another, but that is where it ends. The aggregation framework lacks the necessary math operators to work out if a "point" is "within" a polygon, which would be the only thing you could do if that were even possible. This is about clearing up your misconceptions. Nothing more. – Blakes Seven Mar 23 '16 at 10:09
  • is there a possibility to do this on db side? actually i query my polygons and iterate over my poligons and send a $geoWithin query as a second query – Harald Wiesinger Mar 23 '16 at 12:14
  • @HaraldWiesinger LIke I said. You cannot use values of the "joined" document to issue a "query" within the pipeline, and the math operators required just do not exist. One day they might, but right now they do not. Probably the "better" hope is that `$lookup` gets adapted to issue a "query", or another similarly purposed operator is added. – Blakes Seven Mar 23 '16 at 12:33