0

I was working on mongo and I'd like to do the following things:

when a set of (lat, lon, uid) comes in:

1. the collection has lat as unique index, also for each lat the lon index is unique
2. if (lat, lon) pair exists in this collection, update uid in the sub-document
3. if (lat) exists in this document , insert (lon, uid) in the lons sub-document
4. if (lat) document doesn't exist, create lat document and do 2

[{
    "lat" : 1,  (doc is unique by lat)
    "lons" : [ 
        {
            "lon" : 2,   (the subdocument is unique by lon)
            "uid" : 3
        },
        {
            "lon" : 3,
            "uid" : 3
        }
    ]
},
{
    "lat" : 2,
    "lons" : [ 
        {
            "lon" : 2,
            "uid" : 4
        }
    ]
}]

I tried to do the following things but apparently it's not working as what I imagined.

db.zones.update({'lat': 90}, {$push: {lons: {'uid' : 0, 'lon': -18}}}, { upsert: true })
db.zones.ensureIndex({'lat': -1, 'lons.lon':1}, {unique: true})

I checked this post Can mongo upsert array data? and some others but somehow none of them is working. I don't know if it's my problem or mongo problem. Thanks!

Community
  • 1
  • 1
Shih-Min Lee
  • 9,350
  • 7
  • 37
  • 67

1 Answers1

2

I would suggest you reconsider your schema:

  • The upsert applies at a document level, which doesn't fit nicely with how your schema is structured. If a match isn't found in the lons array you want to push to the existing document rather than creating a new one.

  • Documents including arrays with unbounded growth can lead to frequent document moves and performance issues (see: Why shouldn't I embed large arrays in my documents?)

  • Your schema doesn't lend itself to a geospatial index (which would require longitude/latitude pairs as an array or embedded document). I'm guessing this isn't important for your use case since you are ensuring a normal unique index, but it might be worth considering.

A better schema (assuming you don't plan to use geospatial queries) would be:

{
    lon: -74.0059,
    lat: 40.7127,
    uid: 3
}

With this revised schema, your update requirements are more straightforward.

  1. the collection has lat as unique index, also for each lat the lon index is unique

You still want to ensure a unique index:

      db.zones.ensureIndex({'lat': 1, 'lon':1}, {unique: true})

2. if (lat, lon) pair exists in this collection, update uid in the sub-document

3. if (lat) exists in this document , insert (lon, uid) in the lons sub-document

4. if (lat) document doesn't exist, create lat document and do 2

All this logic can now be handled by an upsert:

db.zones.update(

    // query criteria
    { lat: 40.7127, lon: -74.0060 },

    // update
    { $set: {
        uid: 3
    }},

    // options
    {
        upsert: true
    }
)

If you want to preserve the uid when updating an existing document, you could also use the $setOnInsert operator (instead of $set):

db.zones.update(

    // query criteria
    { lat: 40.7127, lon: -74.0060 },

    // update
    { $setOnInsert: {
        uid: 3
    }},

    // options
    {
        upsert: true
    }
)
Stennie
  • 63,885
  • 14
  • 149
  • 175
  • I was thinking about this before. The problem is that I will have about 10k points in lon and 10k in lat and that means I will have 100m documents created this way. Is this still a good design? – Shih-Min Lee Aug 30 '14 at 12:49
  • Definitely! Read the blog post I referenced on [large embedded arrays](http://askasya.com/post/largeembeddedarrays). In both cases you have the same number of index entries for unique lat/long pairs, so consider indexes as about the same size. However, if you have arrays of 10k `lon` values embedded in each `lat` document there is an extra performance hit as the documents outgrow their space on disk (each time the document moves, all index entries for that document also have to be updated) and if you need to retrieve just one point in the document the server has to load the whole doc into RAM. – Stennie Aug 30 '14 at 13:08
  • I see. for the NxN documents created this way is the query time O(1) or O(N^2) ? Also could you recommend me some more resources for the query, udpate expenses for mongo? the big O's stuff? Thank you very much !! – Shih-Min Lee Aug 30 '14 at 13:53
  • also what I intend to do with the NxN table is that I would insert some uid in there and after I have done N^2 inserts I wound't do anything except queries (for a long time.). Under that circumstances do you still think the NxN documents is a good design? – Shih-Min Lee Aug 30 '14 at 14:01