1

I'm working on schema design of a scalable session table (of a customized authentication) in mongo db. I know the scalability of Mongo DB is inherited from design and also have requirements. My user case is simple,

  1. when user login, a random token is generated and granted to user, then insert record to session table using the token as primary key, which is shard-able. old token record would be deleted if exists.
  2. user access service using the token

my question is, if system keep delete the expired session key, the size of the session collection (considering shard'ed situation that I need partition on the token field) possibly will grow to very big and include alot 'gap' of expired session, how to gracefully handle this problem (or any better design)?

Thanks in advance.

Edit: My question is about storage level. how mongodb manage disk space if records are frequently removed and inserted? it should be kind of an (auto-)shrink mechanism there. Hopefully won't block reads to the collection.

Jason Xu
  • 2,903
  • 5
  • 31
  • 54

4 Answers4

0

I would have to suggest you use TTL. You can read more about it at http://docs.mongodb.org/manual/tutorial/expire-data/ it would be a perfect fit for what your doing. This is only available since version 2.2

How mongo stores data: http://www.mongodb.org/display/DOCS/Excessive+Disk+Space

Way to clean up removed records:

Command Line: mongod --repair

See: http://docs.mongodb.org/manual/reference/mongod/#cmdoption-mongod--repair

Mongo Shell: db.repairDatabase()

See: http://docs.mongodb.org/manual/reference/method/db.repairDatabase/

So you could have an automated clean up script that executes the repair, keep in mind this will block mongo for a while.

PhearOfRayne
  • 4,990
  • 3
  • 31
  • 44
  • Hi, Steve. My question is about storage level. how mongodb manage disk space if records are frequently removed and inserted? it should be kind of an (auto-)shrink mechanism there. – Jason Xu Dec 27 '12 at 08:22
  • Thanks Steve, looking into that now. – Jason Xu Dec 27 '12 at 09:03
  • ...Let's say I have a shard'ed user profile table, and add one field of session token to store the token. then give this field an index. then new token will reuse the space if old token invalidated... will the shard'ed index works properly with the primary key which is the user_id as well as the shard'ing key? – Jason Xu Dec 27 '12 at 09:20
  • @JasonHsu It depends. If you query with the user_id then maybe but if not then no, it won't, it will enforce a global scatter and gather operation since it does not contain the shard key. – Sammaye Dec 27 '12 at 09:26
  • @Sammaye, thanks. It's a workable solution to send token with userid to ensure reaching correct shard of token. I'll look more into it. – Jason Xu Dec 27 '12 at 10:02
0

I agree with @Steven Farley, While creating index you can set ttl, in python by pymongo driver we can do like this

http://api.mongodb.org/python/1.3/api/pymongo/collection.html#pymongo.collection.Collection.create_index

xrage
  • 4,690
  • 4
  • 25
  • 31
0

There are a few ways to achieve sessions:

  1. Capped collections as showed in this use case.
  2. Expire data with a TTL to the index by adding expireAfterSeconds to ensureIndex.
  3. Cleaning sessions program side using a TTL and remove.

Faced to the same problematic, I used solution 3 for the flexibility it provides.

You can find a good overview of remove and disk optimization in this answer.

Community
  • 1
  • 1
Eric
  • 2,784
  • 1
  • 20
  • 25
0

TTL is good and all however repair is not. --repair is not designed to be run regularly on a database, infact maybe once every 3 months or something. It does a lot of internal stuff that, if run often, will seriously damage your servers performance.

Now about reuse of disk space in such an envirionemt; when you delete a record it will free that "block". If another document fits into that "block" it will reuse that space otherwise it will actually create a new extent, meaning a new "block" a.k.a more space.

So if you want save disk space here you will need to make sure that documents do not exceed each other, fortunately you have a relatively static schema here of maybe:

{
    _id: {},
    token: {},
    user_id: {},
    device: {},
    user_agent: ""
}

which should mean that documents, hopefully, will reuse their space.

Now you come to a tricky part if they do not. MongoDB will not automatically give back free space per collection (but does per database since that is the same as deleting the files) so you have to run --repair on the database or compact() on the collection to actually get your space back.

That being said, I believe your documents will be of relative size to each other so I am unsure if you will see a problem here but you could also try: http://www.mongodb.org/display/DOCS/Padding+Factor#PaddingFactor-usePowerOf2Sizes for a collection that will frequently have inserts and deletes, it should help the performance on that front.

Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • I can ensure the size of session record is limited, so it seems won't be big issue if alot delete. experiment may be needed. Thanks @Sammaye . – Jason Xu Dec 27 '12 at 10:24
  • from MongoDB website, it got the feature to reuse the deleted space (under some condition) Recovering Deleted Space MongoDB maintains lists of deleted blocks within the datafiles when objects or collections are deleted. This space is reused by MongoDB but never freed to the operating system. – Jason Xu Dec 27 '12 at 10:46
  • I haven't test but as document tells, mongodb keeps track of space of deleted object, if new obj fits any tracked space, the system can put it into the space which is the reuse I mean. if new obj is too big, new space is used to store, and those space which never match any new obj could only be reclaimed by a repair. – Jason Xu Dec 28 '12 at 02:17
  • @JasonHsu Yes esssentially though the powerof2sizes I linked does help with that problem since of course it increases padding of the document so that more documents will fit, this lowers fragmentation caused by documents not filling these "block" – Sammaye Dec 28 '12 at 09:03