Schemaless financial data, and NoSQL?

Question

We have an application that works with financial data that is schemaless. More accurately, the shemaless data is information about an order, where fields are customized by the merchant. Consistency and durability is important.

Because of how dynamic our data is reporting is very difficult. Each record can vary slightly, or be completely different. If we continue to use a relational database, it appears our only option is to serialize the "document" as a blob. The reporting would have to be done separately, perhaps by copying data into a common structure defined by user defined reports (each "report" would have a custom table).

Another option is a document-oriented NoSQL database like MongoDB. After doing some research it appears that most people wouldn't trust a NoSQL database with financial data because it relies on BASE rather than ACID.

I seem to have found myself in the middle of two completely different use cases. My data fits extremely well into a document-oriented database (MongoDB), but I need the reliability of an ACID database. At the same time complex user-defined reports are a necessity.

So I seem to have three choices:

Use two MySQL databases: One for storing data (blob), and the other for user define reports (lots of tables).
Use MongoDB, which supports large databases, but has a global write lock and is "eventually consistant".
Use MySQL to store data (blob), then copy it to MongoDB for reporting. Given that the only index would probably be merchantID, how well would this work?

So which of the three is my best option (most flexibility and durability)? Are there other options I haven't considered knowing I can't change how dynamic the data is? Anyone use MongoDB for reporting in production?

(For our RDMS we use MySQL. Thinking of switching to MariaDB. Programming language of choice is PHP. Thinking about using Sphinx for FULL TEXT search, like searching for someone's name.)

Have you ruled out an Entity-Attribute-Value style approach in a relational database? http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model — Ian Mercer, Nov 23 '11 at 19:18
I'm pretty sure we do this already. We have on average 20 rows per record (sometimes a couple hundred). Reporting is even more difficult given the amount of joins. The data of each field can range from a number, to a block of text, or an object containing multiple keys. A SUM on this table is very taxing. Editing becomes more complicated as well. If we had to do reporting outside, serialization would probably be more efficient in this case. — Luke, Nov 23 '11 at 19:26

score 3 · Accepted Answer · answered Nov 23 '11 at 20:15

3

Just a couple of points:

MongoDB is eventually consistent only if you read from secondary nodes. Otherwise, it's consistent.

If you need multi-object ACID transactions, then MongoDB won't work. If you need atomicity, consistency, and durability, you can get that MongoDB if you enable journaling and use write-concern judiciously.

answered Nov 23 '11 at 20:15

Kyle Banker

4,359
23
18

So lets say ACID was important. Could I use MySQL to store the shemaless data into a blob, then copy that to MongoDB for reporting? How well does MongoDB work with reporting given the only index will be merchantID? Could I improve performance by having more in the replica set? Sharding? My primary concern is not loosing data. – Luke Nov 23 '11 at 20:22
Do you need multi-document transactions? Or do you just need not to lose data? If the latter, you can get that from MongoDB by enabling journaling and write concern. You don't need ACID proper for that. – Kyle Banker Nov 30 '11 at 22:15
Not sure what kind of reporting you're talking about. It really depends. Do you just mean basic queries on the data? – Kyle Banker Nov 30 '11 at 22:16
Sorry for the late response. Got really busy. I don't need joins, but I need to do sums/filters on certain fields across a collection. Some cases groups as well (group by). – Luke Dec 05 '11 at 21:22

score 0 · Answer 2 · answered Nov 23 '11 at 19:21

I have just seen several positive remarks on the new NoSQL offering from Oracle which seems to focus more on reliability than other NoSQL products. Apparently it's available as a community edition (open source license, don't know which though) and an enterprise edition (surprise...)

http://www.infoworld.com/d/data-explosion/first-look-oracle-nosql-database-179107

It's not document oriented solution but a key/value pair solution

Note that I have not worked with this, but I thought you might want to look into it:

http://www.oracle.com/us/products/database/nosql/overview/index.html

That sound intriguing. Would probably be perfect, but unfortunately we need a document oriented data store. Some of our fields can contain multiple pieces of information (example: quantity, amount) and each piece has to be reported on individually. That's why MongoDB seems so enticing because you can search/filter on the sub-elements. If MongoDB can be used for complex reporting, perhaps we could use it as a secondary data store. — Luke, Nov 23 '11 at 20:18

Schemaless financial data, and NoSQL?

2 Answers2