I've been reading to use $lookup (aggregation) in MongoDB to do what I think is a simple procedure. I don't know if this is the right approach a coz I am a beginner in mongoDB. I have two collections named five_million1_1 and five_million2_1 . Both collections have different different duplicate records. I would like to combine those(article_url) duplicate records into one as well as collect other single records and want store in a single collection. I tried this and this but it's in the same collection.
Collection 1: five_million1_1.
{
"_id" : ObjectId("5921aeadfe329210965ff3d2"),
"article_url" : "a",
"nyt_article_year" : 1994,
"surface_keywords" : [
{
"surface_keyword" : "Greenwich",
"entity_score" : 0.14455
},
{
"surface_keyword" : "Frank Oz",
"entity_score" : 0.60855
}
]
}
{
"_id" : ObjectId("5921aea4fe329210965ff3d1"),
"article_url" : "b",
"nyt_article_year" : 1995,
"surface_keywords" : [
{
"surface_keyword" : "capital gain",
"entity_score" : 0.43096
},
{
"surface_keyword" : "pro forma",
"entity_score" : 0.25205
}
]
}
Collection two: five_million2_1
{
"_id" : ObjectId("5921aeadfe329210965ff4d5"),
"article_url" : "a",
"nyt_article_year" : 1994,
"surface_keywords" : [
{
"surface_keyword" : "dhaka",
"entity_score" : 0.14359
},
{
"surface_keyword" : "Frank",
"entity_score" : 0.60807
}
]
}
{
"_id" : ObjectId("5921aea4fe329210965ff3c1"),
"article_url" : "c",
"nyt_article_year" : 1996,
"surface_keywords" : [
{
"surface_keyword" : "capital gains",
"entity_score" : 0.43096
},
{
"surface_keyword" : "pro formas",
"entity_score" : 0.25205
}
]
}
Expected result
{
"_id" : ObjectId("5921aeadfe329210965ff3d2"),
"article_url" : "a",
"nyt_article_year" : 1994,
"surface_keywords" : [
{
"surface_keyword" : "Greenwich",
"entity_score" : 0.14455
},
{
"surface_keyword" : "Frank Oz",
"entity_score" : 0.60855
},
{
"surface_keyword" : "dhaka",
"entity_score" : 0.14359
},
{
"surface_keyword" : "Frank",
"entity_score" : 0.60807
}
]
}
{
"_id" : ObjectId("5921aea4fe329210965ff3d1"),
"article_url" : "b",
"nyt_article_year" : 1995,
"surface_keywords" : [
{
"surface_keyword" : "capital gain",
"entity_score" : 0.43096
},
{
"surface_keyword" : "pro forma",
"entity_score" : 0.25205
}
]
}
{
"_id" : ObjectId("5921aea4fe329210965ff3c1"),
"article_url" : "c",
"nyt_article_year" : 1996,
"surface_keywords" : [
{
"surface_keyword" : "capital gains",
"entity_score" : 0.43096
},
{
"surface_keyword" : "pro formas",
"entity_score" : 0.25205
}
]
}