2

What I've got right now with Mysql

Here is my database:

enter image description here

I search users by location a lot using bounding box.

There are two more tables: user_tag and tags. The overall database size is about 1 Gb.

I've implemented an arbitrary tag system with these tables, so that when user wants to use tag that hasn't been created yet, this tag is inserted in tag table.

I also search users by tags.

Benchmarks

I have no indexes in this database except those on primary keys.

As you can see there are a lot of inserts and they take much time.

Main problem here is time consuming inserts and updates.

Create new Event with tags(~150ms):

http://pastebin.com/vyw6qhrN

Update Event(~200ms):

http://pastebin.com/f28yvn9z

What I don't like in this solution:

  1. When I create new user I do insert in 3 tables to link the user with his tags.
  2. When updating user information I also need to do 3 updates and 1 deletion or insert when user changes tags.
  3. Searching users by tags gets really messy (complex query) (How to implement tag system)

What can I get with NoSql

I want to use a document oriented database. Then I will need only one collection:

{
"name": "Dan",
"lat": 60
"lon": 30
"tags":["football", "fishing"]  
}

I will be able to set index on tags and lat and lon for faster search.

My questions

  1. Should I switch to NoSql or I can somehow improve my current implementation. Or maybe switch to a different RDBMS?
  2. In case I should switch: What NoSql database is the best in this case?
  3. In case I should switch to MongoDb: Is it reliable and mature enough? Because I've read a lot of posts about people going away from MongoDb. For example: http://www.reddit.com/search?q=mongodb
Community
  • 1
  • 1
warmspringwinds
  • 1,147
  • 2
  • 14
  • 31
  • Hi, it's ~1GB. But i'm considering future growth. – warmspringwinds Feb 08 '13 at 09:38
  • 1
    1GB is nothing. You are better off sticking to MySQL. If you had 1TB, that would be different problem, and thinking about NoSQL would be appropriate. – mvp Feb 08 '13 at 09:41
  • Yes, you are right about the size. But what would you say about schemaless, so that on updating and and creating i need to do write or update only in one collection(it's about my arbitary tag system)? NoSql will speed up this part. – warmspringwinds Feb 08 '13 at 09:47
  • NoSQL is sort of denormalization. It maybe good to speed up one query type. But it makes some other queries (especially joins) slow or even close to impossible. – mvp Feb 08 '13 at 09:49
  • True. But you can embed multiple entities in one document(like i've done with tags). Or for example if i will have one more table named Events, that will contain events created by users. And in this table i will have athor_id column. And if i will need to find all events by author,all i will need is only one query where author_id == user_id. I will do just one more query to database. – warmspringwinds Feb 08 '13 at 09:58
  • 1
    So, you are trying to do a kind of pre-optimization. Did you perform some tests to find when your DB will not handle selects/upserts at the good speed? You may do the same things with MySQL. Why are you looking for a new database, did you tried to optimize MySQL instance(s)? – ravnur Feb 08 '13 at 10:03
  • Everybody to their own - but I think this is premature optimization. I would not trade convenience of SQL for some buzzword until I start having performance problems. And I'll bet that until your database grows beyond few hundred GB, you will not have these problems. And, even with NoSQL, you may [have severe performance problems](http://stackoverflow.com/q/11022980) – mvp Feb 08 '13 at 10:04
  • @ravnur I've updated my question with benchmarks. I don't think that 150ms is fast enough. And i have no indexes yet. With them it will be slower. – warmspringwinds Feb 08 '13 at 10:21
  • @warmspringwinds a schema dump (incl. index informations) of your mysql tables could be usefull. looks like a mysql optimisation problem. – emrox Feb 08 '13 at 10:26
  • Could you do: 1) the same in parallel with 20-30 threads? you can use `mysqlslap` tool or bash script; 2) the same with selects? 3) repeat inserts/select on the data that you expect within a year. If `selects` will be slowed than try to add indexes, try to add some memory and play with innodb parameters (expand innodb_buffer_pool_size at first). – ravnur Feb 08 '13 at 10:26

1 Answers1

1

Both technologies can probably solve your problem. Some scenarios are easier to handle with a RDBMS, others with a more specialized database. It depends on the details of your requirements, your experience and your personal preferences.

@mvp commented on the "convenience of SQL". Personally, I find SQL a major pain because object-oriented and SQL aren't easy to map. People often use their ORM behemoths, which I find an antipattern -- chances are the ORM code size is more than 50 times the entire application code you have so something is fishy. But that is just my opinion, SQL is still probably the most common data store.

Personally, I have the feeling your problems maps to MongoDB quite nicely, because

  • It has geo indexes and supports various geo queries
  • It is very easy to create simple tagging, if that is what you need
  • It's easy and handling a few GB of data is easy, too.
  • It's easy to administer. I don't need to meddle with innodb_buffer_pool_size or whatnot at that scale.
  • Joins are overrated. Joins are needed, because you split up data that belongs together to squeeze it into tables. If you want to find answers to questions like "users who like football and live in foo also like?", the aggregation framework and caching are easier and more scalable than huge joins.

If I were you, I'd sit down for a day or two and give it a spin: You have a reasonably sized data set so you can do testing with real-world data, and changing just a few queries should be very easy. It will be fun and you get a feeling for the upsides and downsides first hand.

By the way, three of the articles on reddit refer to each other: "Don't use MongoDB" on pastebin, Eliot Horowitz' answer at news.ycombinator.com and "The MongoDB story was a hoax", so no, MongoDB doesn't just crash randomly and have a gazillion bugs. But of course, it's not a silver bullet that just magically makes scaling issues disappear.

mnemosyn
  • 45,391
  • 6
  • 76
  • 82
  • I'll try it out for sure) I was also looking for another option: do you know db that has geo indexes and provides flexibility as mongo? Just to have something to compare with. – warmspringwinds Feb 08 '13 at 11:42
  • CouchDB has a plugin. There's a longer list on Wikipedia, but I haven't used any of them: http://en.wikipedia.org/wiki/Spatial_database – mnemosyn Feb 08 '13 at 13:40
  • Seems like mongo is the only option. By your answer i see that you've used mongo several times. Have you experienced any serious troubles while using it? – warmspringwinds Feb 08 '13 at 16:01
  • No, I haven't. There are some caveats to be aware of, but that is true for any data store I guess. It can eat up a lot of memory if you're not careful (both disk and RAM) and freeing up space is tricky, but aside from that everything works like a charm. – mnemosyn Feb 08 '13 at 17:16