1

I am using Monary to connect to my MongoDB. but I am struggling to figure out where exactly and how to set the allowDiskUse option?

client = Monary("ip.address.of.server", 27017 , "username", "password", "dbname")

pipeline = [
        {"$group" : {
            "_id" : {"user":"$subscriber_id",
                 "month": { "$month" : "$timestamp" },
                 "day" : { "$dayOfMonth" : "$timestamp" },
                 "year" : { "$year" : "$timestamp" },
                 "hour" : { "$hour" : "$timestamp" },
                 "category":"$category_name"
                },
            "activities_sum":{"$sum":"$activity_count"}
            }
        }
    ]

with client as m:
    users, years, months, days, hours, categories, activities  = m.aggregate("digicel_exploration",
                "5_min_slots",
                time_aggregation_pipeline,
                ["_id.user", "_id.year", "_id.month", "_id.day", "_id.hour", "_id.category", "activities_sum"],
                ["string:30", "int32", "int32", "int32", "int32", "string:60", "int32"])
Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
Rami
  • 8,044
  • 18
  • 66
  • 108

1 Answers1

1

Monary uses the mongoc driver underneath and directly in a way that does not abstract the pymongo driver, which is really the official source that is under MongoDB company maintenance.

As such the implementation has been done in a way that does not allow the necessary "options" to be passed into the aggregate() method for things such as "allowDiskUse".

You can see the implementation code here, paying attention to the forth and fifth arguments which are hard coded NULL:

// Get an aggregation cursor
mcursor = mongoc_collection_aggregate(collection,
                                      MONGOC_QUERY_NONE,
                                      &pl_bson, NULL, NULL);

When you compare this to the doccumented signature for mongoc_collection_aggregate, then the problem becomes clear:

mongoc_cursor_t *
mongoc_collection_aggregate (mongoc_collection_t       *collection,
                             mongoc_query_flags_t       flags,
                             const bson_t              *pipeline,
                             const bson_t              *options,
                             const mongoc_read_prefs_t *read_prefs)
   BSON_GNUC_WARN_UNUSED_RESULT;

If you need this option in your processing, then you would be better off using pymongo directly and loading up your NumPy arrays manually based on the results.

Alternately, you could take the approach as has already been mentioned in a reported issue on the subject, and patch up the source yourself if you are prepared to build yourself:

bson_t opts;
bson_init(&opts);
BSON_APPEND_BOOL (&opts, "allowDiskUse", true);
mcursor = mongoc_collection_aggregate(collection,
                                      MONGOC_QUERY_NONE,
                                      &pl_bson, &opts, NULL);
bson_destroy(&opts);

Or even provide a full patch yourself that adds the options signature to the method definition and passes them through correctly.

Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
  • Thanks a lot for the clear answer, I was hoping to use Monary as it can load directly the results into NumPy arrays. I might then back to pymongo knowing that it will be slower... but I am not ready to patch the source myself :) Cheers – Rami Oct 08 '15 at 06:54