13

I am trying to achieve the following:

Select all the records that I own, where ownership is objects I have created OR objects a user I manage has created, where user management can be in a hierarchy of users managing users

Ownership is clearly straight forward and could be handled by a simple id corresponding to the owner. The hierarchy of user management has me a little stumped to perform without heavy lifting through large lists of IDs (you can obviously just find every user that is managed and list every object created by any of those users using an IN clause or similar).

Ideally this all happens in a single query so normal paging and conditions can occur.

I was thinking that there might have been some maths to get it done - having IDs that can somehow be hashed to determine if they are owned by anyone in the chain of command.

Any references for this sort of thing?

Am I missing something obvious?

Using MongoDB if that makes a difference, but happy to think about other databases for inspiration.

UPDATE: have created a MongoDB collection with 1,000,000 records to get some solid data on exactly what constitutes a manageable number of parameters for an IN clause on a query. Will report back when I have some concrete information.

ANALYSIS:

Using ruby-mongo-driver and the ruby benchmark lib.

MongoDB Collection with 1039944 records

Records are defined as:

{
    first_name: String,
    last_name: String,
    email: String,
    phone: String,
    company: String,
    owner: BSON::ObjectId
 }

With randomly generated values for all fields.

The Owner field has an index.

Running queries with the following conditions:

conditions = {"owner" => { "$in" => id_list }}
opts = {skip: rand, limit: 100}

Results:

    # 10201 ids
    #              user     system      total        real
    # 0:       0.240000   0.000000   0.240000 (  0.265148)
    # 1:       0.240000   0.010000   0.250000 (  0.265757)
    # 2:       0.240000   0.000000   0.240000 (  0.267149)
    # 3:       0.240000   0.000000   0.240000 (  0.269981)
    # 4:       0.240000   0.000000   0.240000 (  0.270436)
    # Find:    0.240000   0.000000   0.240000 (  0.266709)


    # 5201 ids
    #              user     system      total        real
    # 0:       0.120000   0.000000   0.120000 (  0.133824)
    # 1:       0.120000   0.000000   0.120000 (  0.134787)
    # 2:       0.110000   0.000000   0.110000 (  0.133262)
    # 3:       0.110000   0.000000   0.110000 (  0.136046)
    # 4:       0.120000   0.000000   0.120000 (  0.141220)
    # Find:    0.130000   0.000000   0.130000 (  0.139110)

    # 201 ids
    #              user     system      total        real
    # 0:       0.010000   0.000000   0.010000 (  0.006044)
    # 1:       0.000000   0.000000   0.000000 (  0.004681)
    # 2:       0.010000   0.000000   0.010000 (  0.004578)
    # 3:       0.000000   0.000000   0.000000 (  0.007048)
    # 4:       0.010000   0.000000   0.010000 (  0.008487)
    # Find:    0.000000   0.000000   0.000000 (  0.005990)

    # 1 id (NOT using IN)
    #              user     system      total        real
    # 0:       0.000000   0.000000   0.000000 (  0.002868)
    # 1:       0.000000   0.000000   0.000000 (  0.004937)
    # 2:       0.010000   0.000000   0.010000 (  0.003151)
    # 3:       0.000000   0.000000   0.000000 (  0.002983)
    # 4:       0.000000   0.000000   0.000000 (  0.003313)
    # Find:    0.000000   0.000000   0.000000 (  0.002742)

Even with a list of 10k ids in the query, performance is pretty snappy.

Community
  • 1
  • 1
Toby Hede
  • 36,755
  • 28
  • 133
  • 162
  • Upon reflection, I guess if there was a "magic" function to compute parents then the heavy lifting around Nested Sets, Materialized Paths and Adjacency Lists to get trees into a DB would be moot. – Toby Hede Nov 21 '11 at 07:56

1 Answers1

2

If you are trying to "select" records from MongoDB based on a "column" having a value from a set of possible values that you'd need a join against a user management table to determine, then NoSQL is working against you...

If the list of user IDs is still manageable you can do a where ownerId in (?,?,?,?,?...) type of query (after having first determined the list):

db.documents.find({owner:{$in: [1234, 2345, 4444, 77777, 99999]}})

The NoSQL way is probably to denormalize things, for example by including not just the ownerId in the document, but the complete path up the management hierarchy:

{  _id: 'the document A',
   owner : 1234,
   managers: [ 2345, 4444, 77777, 99999 ]
}

Of course, that will need to be updated when the user hierarchy gets shifted around.

Thilo
  • 257,207
  • 101
  • 511
  • 656
  • 1
    Then index `managers` and use the [Multikeys](http://www.mongodb.org/display/DOCS/Multikeys) stuff for the search. – mu is too short Nov 21 '11 at 05:33
  • Yeah the IN option is something I have looked at. Problem is that the can managers have managers and every time a permission changes you have to trawl through the entire collection reseting everything. Clearly not the ideal way of managing the problem. – Toby Hede Nov 21 '11 at 05:33
  • With the IN option, permission changes will be okay, because you do not have to update the collection itself (just query with different ids). If you do de-normalization, then, yes, you will have to "rebuild the index". Avoiding this was the main benefit of normalization and joins, but you don't get that without an RDBMS. – Thilo Nov 21 '11 at 05:37
  • 1
    Ah, I see what you mean ... find all the ids for the user, map by that. The denormlaised approach is going to break nearly instantaneously. – Toby Hede Nov 21 '11 at 05:40
  • But in any case, I don't think I can depend on the list of ids being "manageable". – Toby Hede Nov 21 '11 at 05:57