5

I am working on a project using Java and Spring 3. There is a new task for me. There will be Xml files and I get that files and convert them into Objects. After that I will put them into a database.

The main topic for me to examine nosql databases. CouchDb and MongoDb are the databases I should search. I will make search on that objects(one of the index type will be date and I will make date between selects) at database. Performance is so important for me and

I will work on a huge data thats why I should search nosql databases.

What do you suggest according to my scenario, what are pros/cons of them and which one I should choose and why?

I searched and see that Couch DB uses a REST API and Mongo DB uses drivers and it is performance plus for Mongo according to here: http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB

However Couch DB uses replication a way to scale(is it a performance plus?)

Also I realize that there are BaseX and eXist. According to my need what do you suggest did anyone worked with them?

PS:Also I will get XML files as like logs. They will not change and I won't manipulate data on it.

Community
  • 1
  • 1
kamaci
  • 72,915
  • 69
  • 228
  • 366
  • do you want to search within "XML" files? If so, what? Also, why do you think you need nosql? – Karoly Horvath Oct 10 '11 at 21:01
  • I have log files as XMLs and the data is really big. I should make search on them with a high performance. I don't need more complex search on them. I will get that XML files and convert them into Java objects and put them into database. Actually all in all I will search on that XML files. i.e. there will be date attribute at my XML files and I will want to get the values from some date till some date etc. – kamaci Oct 10 '11 at 21:13
  • that's not enough info to answer you question. You might be able to indexes the search fields, but you might have to do full table scan... Are there a lot records/objects? Again, what fields/attributes/nodes do you search? – Karoly Horvath Oct 10 '11 at 21:16
  • @kamaci: what *kind* of queries are you expecting? Just by timestamp? Will you have to do Full Text Search in the XML files? Will you need to do XPath queries? – thkala Oct 10 '11 at 21:16
  • I will have a big data to search and performance is important for me. I won't search on just timestamps there will be other fields too. – kamaci Oct 10 '11 at 21:19
  • @kamaci: 1. do you know the XML file schema? 2. do you know which fields you will have to search on? 3. Will you have to support queries on arbitrary XML fields? – thkala Oct 10 '11 at 21:19
  • There are elements at my XML log as like dates, names, numbers and complex types. I know the XML schema. User will search on that log files at which fields he/she wants. – kamaci Oct 10 '11 at 21:26
  • @kamaci: Hmmm, if the total number of those fields is relatively low (e.g. < 200) and you know them beforehand you might be able to use a relational DB. Otherwise a NoSQL DB is the way to go. For example, I had to use MongoDB because I had log files with over 40K fields and no known schema... – thkala Oct 10 '11 at 21:32
  • There will be realtime logging and a big big data to search that why I want to use nosql. – kamaci Oct 10 '11 at 21:39
  • Does couch db's rest communication makes it slow because of it has to make a HTTP connection at first? – kamaci Oct 10 '11 at 21:39

3 Answers3

1

I have only used MongoDB in a high-data-volume, low-load internal application, so I cannot really offer first hand advice for your choice.

The MongoDB people, however, have a comparison with CouchDB here. There are also quite a few more independent opinions (1, 2).

You should also consider the quality of the available database drivers for your environment. The Java MongoDB driver is quite stable, in my experience, but it seems to me that it still incurs more processing overhead than it should. I have not idea about any of the CouchDB drivers.

Do you have any other requirements apart from the ability to store large amounts of data? Do you need replication or sharding?

PS: How are you storing the XML files anyway? XML files do not map into JSON (which is what e.g. MongoDB uses) perfectly - unless you store the whole XML text in a single field.

PS2: Are you sure that you need a document-based database? If you are only going to perform searches on a few fields that are known beforehand, a relational DB might be easier to handle. Document-based DBs start making sense only when you don't have a predefined schema for your data or when you need to store more complex object hierarchies.

PS3: May I ask why huge data implies NoSQL to you? You can store insane amounts of data on any modern relational database (as long as you have the hardware, of course).

EDIT:

A couple of related SO questions:

(...and about a thousand more)

Maybe also these:

Community
  • 1
  • 1
thkala
  • 84,049
  • 23
  • 157
  • 201
  • I would use something like `{'xmlnodename': {'attribs': { ... }, 'childs': { ..recursion.. }}, 'nextxmlnode': { ... } }`. TXT is a very baaad idea. – Karoly Horvath Oct 10 '11 at 20:57
  • I have XML files and XSD for them. I want to get Java Object from that Java Objects and I will send them to database. Should I transform my datas to JSON Objects to keep them at Mongo DB? – kamaci Oct 10 '11 at 20:59
  • @yi_H: Depends on what the OP needs to do. For all I know, they might be able to get away by extracting the few searchable fields and storing them in a relational DB, along with the full XML text as a blob. – thkala Oct 10 '11 at 21:09
  • @kamaci: MongoDB uses BSON internally, which IIRC is a binary representation of JSON. Any object will have to be converted to that format before storage. – thkala Oct 10 '11 at 21:12
  • Thanks for your help. There will be a part at my system that gives me log files as XML files. There is a XSD for them too. XML files do not map into JSON or etc. but I will take that XML files and I can convert them into JSON objects and send them to any DB. – kamaci Oct 10 '11 at 21:23
1

This is a pretty big question but I will do my best to tackle it. A company I work for was making the change from developing our applications with Mysql to NoSQL and i was the lead on the first NoSQL database, we were deciding which NoSQL database to work with. I was between MongoDB, CouchDB and Cassandra. One important factor I had to look at was, how easy will it be to write base line functions to work with the database so u don't have to understand what is going on but still able to execute querys and so on. The issue with cassandra was there API was super low level and would take some time to write a solid high level interface and we did not have that kind of time. The issue with couchdb was the REST service. Since we were already connecting to our inhouse api using rest it would have been a double rest service. REST generally goes over http and there is a fair amount of over head for http to be as easy to work with has it is. And that over head adds time to loading information. So we took mongodb for that reason and many other reasons. Also since its a driver it is developed to work with the programming language which is great if your language is supported sucks if its not. Since Java is supported by mongodb then its fine.

I would recommend converting the XML files in to objects and then storing the objects in mongo. so each XML file would be embedded mongodocuments the great thing about mongo is you can search embedded documents and u can index them. So enjoy hat

FrankS101
  • 2,112
  • 6
  • 26
  • 40
WojonsTech
  • 1,277
  • 1
  • 13
  • 28
0

I'd like to add that Couchbase is a faster and more scalable option than CouchDB, the 2.0 version introduces Views, at a high level it's a distributed memcached (Membase Server) merged with CouchDB, but of course more sophisticated than just mashing them together. Founders of both CouchDB and Membase Server created Couchbase.

Also likely the best way to handle is conversion of XML-JSON for storage, and JSON-XML on retrieve. If you are doing XPATH queries in the database, then it would need to be a bit more sophisticated in the View creation.

www.couchbase.com

scalabl3
  • 1,273
  • 6
  • 7