I have a collection of 6-7 million event records. I have another collection of ~100,000 hourly weather records spanning the same timeframe as the event records. I am using an $aggregate pipeline with $lookup to merge in relevant weather data for each event in the event collection.
THE PROBLEM: I have been running this on the full EVENT dataset for more than 8 HOURS, with no result. I have a deadline and I'm wondering if I will get a result...ever.
PLEASE HELP
Here is a sample event record:
{
"_id" : ObjectId("5dedae8111cd89b173b00910"),
"EventType" : "P",
"Jurisdiction" : "ABCD",
"Year" : 2006,
"JulianDay" : 91,
"CallReceipt" : ISODate("2006-04-01T00:00:37Z"),
"EventClosed" : ISODate("2006-04-01T00:05:25Z"),
"FinalType" : "EFGHI",
"EventWindowStart" : ISODate("2006-04-01T00:00:00Z"),
"EventWindowEnd" : ISODate("2006-04-01T01:00:00Z")
}
Here is a weather record:
{
"_id" : ObjectId("5dc3cd909fc78c0c78a336da"),
"DATE" : ISODate("2012-01-01T00:02:00Z"),
"REPORT_TYPE" : "FM-16",
"SOURCE" : 7,
"HourlyAltimeterSetting" : "30.06",
"HourlyDewPointTemperature" : "36",
"HourlyDryBulbTemperature" : "37",
"HourlyPresentWeatherType" : "BR:1 ||",
"HourlyRelativeHumidity" : 93,
"HourlySkyConditions" : "SCT:04 7 BKN:07 15 OVC:08 33",
"HourlyStationPressure" : "29.46",
"HourlyVisibility" : "5.00",
"HourlyWetBulbTemperature" : 37,
"HourlyWindDirection" : "260",
"HourlyWindSpeed" : 5,
"REM" : "MET10101/01/12 00:02:02 SPECI KROC 010502Z 26004KT 5SM BR SCT00
7 BKN015 OVC033 03/02 A3006 RMK AO2 RTX (MP)",
"REPORT_MODE" : "hourly"
}
Here is my code, typed directly into the mongo shell:
db.EVENTS.aggregate([
{
$lookup:
{
from: "WEATHER",
let : { start : "$EventWindowStart", end: "$EventWindowEnd"},
pipeline : [
{ $match :
{ $expr:
{ $and:
[
{$gte: ["$DATE", "$$start"]},
{$lte: ["$DATE", "$$end"]}
]
}
}
},
{$project: {
_id : 0,
HourlyDryBulbTemperature : 1,
HourlyPrecipitation : 1,
HourlyVisibility : 1,
WindSpeed: 1
}
}
],
as: "HourlyWeatherData"
}
},
{$out: "MERGED" }
])
On a small test subset I get the desired output. So the code works, as far as I can tell...
Sample output:
{
"_id" : ObjectId("5dedae8111cd89b173b00910"),
"EventType" : "P",
"Jurisdiction" : "ABCD",
"Year" : 2006,
"JulianDay" : 91,
"CallReceipt" : ISODate("2006-04-01T00:00:37Z"),
"EventClosed" : ISODate("2006-04-01T00:05:25Z"),
"FinalType" : "EFGHI",
"EventWindowStart" : ISODate("2006-04-01T00:00:00Z"),
"EventWindowEnd" : ISODate("2006-04-01T01:00:00Z"),
"HourlyWeatherData" : [
{
"HourlyDryBulbTemperature" : "59",
"HourlyPrecipitation" : "0.00",
"HourlyVisibility" : "10.00"
},
{
"HourlyDryBulbTemperature" : "59",
"HourlyVisibility" : "9.94"
}
]
}
PS: I do have ascending indexes on the event window fields in EVENTS, and an ascending and descending index on the DATE in WEATHER.