7

I've recently started looking into MongoDB for a project at work. I'm fairly new to JSON and MongoDB's query structure, so I'm hoping one of you can provide some clarification.

I've translated the problem into Excel terminology since it's common and represents my question fairly well.

If I were attempting to model an Excel formula into a MongoDB document, what is the best format to do it in (I'll explain my potential queries lower)? Keep in mind that formulas in Excel can be nested in (nearly) any order, and with any depth, and arguments can come in either string or numerical form. I would like to be able to search across these cells to answer such queries as "Find all cells that use the =AVG() function" or "Find all cells that contain the =SUM() function inside of an =AVG() function (such as =AVG(x,y,z,SUM(a,b,c)))." Being able to answer these formula-structure based queries is more important than being able to answer ones about numbers or strings if answering all isn't possible.

Currently I'm envisioning my documents having roughly the following format:

{
    formula: "AVG",
    arguments: [4,5, {
        formula: "SUM",
        arguments: [6,7,{
            formula: "ABS",
            arguments: [-8,-9]
(closing parenthesis/brackets)
}

Is that a reasonable format for what I'm looking to do? If it is, how would I query for "Find cases with =SUM inside of =AVG"? What about finding the =ABS formula that's nested even deeper? Because of the dynamic nature of formulas, its not really possible to expect a certain order or certain depth.

Community
  • 1
  • 1
  • FYI, you can put things like `=SUM` and `=AVG` inside of backticks (usually on the same key as ~) since these are code that belongs inline. Welcome to StackOverflow! – Translunar Jul 17 '13 at 19:58
  • @mohawkjohn I did indeed read about inline code, I just don't really consider Excel formulas cool enough to merit sexy code formatting. If you think it'd help others understand my question I'll change it though. – Charles Spinosa Jul 17 '13 at 20:04
  • I wouldn't necessarily think this is a good fit for MongoDB given the description of the queries you want to make. It sounds more like you might want to consider a graph database that allows the more complex "child/ancestor" type searches. – WiredPrairie Jul 17 '13 at 21:10
  • Very well presented question, thank you. This has turned me away from MongoDB to have a look at graph databases before having invested too much energy in the former. Cheers. – Sean Connolly Apr 21 '14 at 14:48

2 Answers2

5

If you have an arbitrary structure like this, then I suggest you store the trees in a different way. Arbitrary structures are difficult to query and deal with.

The MongoDB documentation has a few suggestions: http://docs.mongodb.org/manual/tutorial/model-tree-structures/

Derick
  • 35,169
  • 5
  • 76
  • 99
0

Continuing with the Excel analogy:

First thing's first, I decided that querying on second level and beyond children wasn't really necessary (this makes the problem much, much simpler). This structure can't adapt easily to that, so fair warning. This is not a perfect answer to my original question, sorry! It seems like MongoDB just isn't well suited to store my original question very easily.

I've used a structure akin to how MongoDB recommends representing trees. Instead of trying to represent an Excel doc as one MongoDB document, it is now represented by many (one per unique formula used, and one to store strings and values). All MongoDB documents related to the same excel workbook simply have a field workbook and store which one they belong to there.

A very simple example (the custom _id tags aren't necessary, but are easier to read):

{_id: book1_1
 workbook: book1,
 cell_values: [1,2,3],
 cell_strings: ['hello','world']}

{_id:book1_2
 workbook:  book1,
 formula: 'SUM',
 children: ['AVG','ABS']}

{_id:book1_3
 workbook: book1,
 formula: 'AVG',
 children: ['SUM']}

{_id:book1_4
 workbook: book1,
 formula: 'ABS',
 children: ['SUM']}

These 4 MongoDB documents represent one Excel document that has the following formula structures (this is not the only excel sheet that would produce the above MongoDB documents):

=SUM(AVG())
=AVG(SUM())
=ABS(SUM(ABS()))

Along with values 1,2,3 and strings 'hello','world' somewhere inside of it.

Querying for 'Find books with SUM formula inside of AVG formula' is then the following query:

db.collection.find({$and: [{formula: 'AVG'},{children: 'SUM'}]})

returns the _id:book1_3 MongoDB document. You can then strip the workbook off that however you'd like.