using 2 different result sets in mongodb

Question

I'm using groovy with mongodb. I have a result set but need a value from a different grouping of documents. How do I pull that value into the result set I need?

MAIN:Network data

"resource_metadata" : {
"name" : "tapd2e75adf-71",
"parameters" : { },
"fref" : null,
"instance_id" : "9f170531-79d0-48ee-b0f7-9bd2788b1cc5"}

I need the display_name for the network data result set which is contained in the compute data.

CPU data

"resource_id" : "9f170531-79d0-48ee-b0f7-9bd2788b1cc5",
"resource_metadata" : {
"ramdisk_id" : "",
"display_name" : "testinstance0001"}

You can see the resource_id and the Instance_id are the same values. I know there is no relationship I can do but trying to reach to see if anyone has come across this. I'm using the table model to retrieve data for reporting. Hashtable has been suggested to me but I'm not seeing that working. Somehow in the hasNext I need to include the display_name value. in the networking data so GUID number doesn't only valid name shows from compute data.

def docs = meter.find(query).sort(sort).limit(50)\
while (docs.hasNext()) { def doc = docs.next()\
model.addRow([ doc.get("counter_name"),doc.get("counter_volume"),doc.get("timestamp"),\ 
doc.get("resource_metadata").getString("mac"),\
doc.get("resource_metadata").getString("instance_id"),\
doc.get("counter_unit")] 
as Object[]);}

Full document: 1st set where I need the network data measure with no name only id {resource_metadata.instance_id}

  {
"_id" : ObjectId("528812f8be09a32281e137d0"),
"counter_name" : "network.outgoing.packets",
"user_id" : "4d4e43ec79c5497491b23b13644c2a3b",
"timestamp" : ISODate("2013-11-17T00:51:00Z"),
"resource_metadata" : {
"name" : "tap6baab24e-8f",
 "parameters" : { },
 "fref" : null,
 "instance_id" : "a8727a1d-4661-4565-9c0a-511279024a97",
 "instance_type" : "50",
 "mac" : "fa:16:3e:a3:bf:fc"
 },
"source" : "openstack",
"counter_unit" : "packet",
"counter_volume" : 4611911,
"project_id" : "97dc4ca962b040608e7e707dd03f2574",
"message_id" : "54039238-4f22-11e3-8e68-e4115b99a59d",
"counter_type" : "cumulative"
 }

2nd set where I want to grab the name as I get the values {resource_id}:

 "_id" : ObjectId("5287bc3ebe09a32281dd2594"),
"counter_name" : "cpu",
"user_id" : "4d4e43ec79c5497491b23b13644c2a3b",
"message_signature" :  
"timestamp" : ISODate("2013-11-16T18:40:58Z"),
"resource_id" : "a8727a1d-4661-4565-9c0a-511279024a97",
"resource_metadata" : {
 "ramdisk_id" : "",
 "display_name" : "vmsapng01",
 "name" : "instance-000014d4",
 "disk_gb" : "",
 "availability_zone" : "",
 "kernel_id" : "",
 "ephemeral_gb" : "",
 "host" : "3746d148a76f4e1a8203d7e2378ef48ccad8a714a47e7481ab37bcb6",
 "memory_mb" : "",
 "instance_type" : "50",
 "vcpus" : "",
 "root_gb" : "",
 "image_ref" : "869be2c0-9480-4239-97ad-df383c6d09bf",
 "architecture" : "",
 "os_type" : "",
 "reservation_id" : ""
  },
  "source" : "openstack",
  "counter_unit" : "ns",
"counter_volume" : NumberLong("724574640000000"),
"project_id" : "97dc4ca962b040608e7e707dd03f2574",
"message_id" : "a240fa5a-4eee-11e3-8e68-e4115b99a59d",
"counter_type" : "cumulative"
  }

This is another collection that contains the same value but just thought it would be easier to grab from same collection:

 "_id" : "a8727a1d-4661-4565-9c0a-511279024a97",
 "metadata" : {
"ramdisk_id" : "",
"display_name" : "vmsapng01",
"name" : "instance-000014d4",
"disk_gb" : "",
"availability_zone" : "",
"kernel_id" : "",
"ephemeral_gb" : "",
"host" : "3746d148a76f4e1a8203d7e2378ef48ccad8a714a47e7481ab37bcb6",
"memory_mb" : "",
"instance_type" : "50",
"vcpus" : "",
"root_gb" : "",
"image_ref" : "869be2c0-9480-4239-97ad-df383c6d09bf",
"architecture" : "",
"os_type" : "",
"reservation_id" : "",
 }

Mike

score 0 · Answer 1 · answered Nov 18 '13 at 15:35

0

It looks like these data are in 2 different collections, is this correct?

Would you be able to query CPU data for each "instance_id" ("resource_id")?

Or if this would cause too many queries to the database (looks like you limit to 50...) you could use $in with the list of all "Instance_id"s http://docs.mongodb.org/manual/reference/operator/query/in/

Either way, you will need to query each collection separately.

answered Nov 18 '13 at 15:35

Alan Spencer

566
3
12

Hi,the data is in the same collection. – user2291795 Nov 19 '13 at 13:58
Can you add an example full document? – Alan Spencer Nov 19 '13 at 16:55
Added example of full document. – user2291795 Nov 20 '13 at 20:33
Great. Now I understand. Given it's only the name you need from the CPU document, could you embed it along with the id? Also, you might want to consider a different schema for time series data? See: http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb – Alan Spencer Nov 21 '13 at 11:13
How would I embed it with the cpu doc _id? Not sure how to do that? – user2291795 Nov 22 '13 at 05:00
When you are inserting the network.outgoing.packets with it's resource_metadata.instance_id also include resource_metadata.instance_display_name if that is possible for you. This does mean data duplication, but for this case it sounds correct. I would also look at redesigning your schema as the blog post about time series data suggests... See: http://stackoverflow.com/questions/5373198/mongodb-relationships-embed-or-reference – Alan Spencer Nov 22 '13 at 10:14

using 2 different result sets in mongodb

1 Answers1