15

I would like to extend the Apache Drill Mongo Storage Plugin to push down INNER JOINs. Therefore I would like to rewrite INNER JOIN into the mongo aggregation pipeline.

How do we need to start to implement the rewrite in Apache Drill.

Here is a SQL example:

SELECT *
FROM `mymongo.db`.`test` `test`
  INNER JOIN `mymongo.db`.`test2` `test2`
  ON (`test`.`id` = `test2`.`fk`)
WHERE `test2`.`date` = '09.05.2017'

I have found the push down of WHERE clauses in the Mongo Storage Plugin. But I am still struggling to do the same for INNER JOINS. How would the constuctor of public class MongoPushDownInnerJoinScan extends StoragePluginOptimizerRule look like? Which equivalent of MongoGroupScan (AbstractGroupScan) would I have to implement? Any help would be very much appreciated.

Dennis Münkle
  • 5,036
  • 1
  • 19
  • 18

1 Answers1

1

If you want to make an inner join with the aggregation framework similar to SQL you can do it with the pipeline stage $lookup.

$lookup:
    {
    from: <collection to join>,
    localField: <field from the input documents>,
    foreignField: <field from the documents of the "from" collection>,
    as: <output array field>
    }
}