3

I have seen many references pointing to the use of Lucene or Solr as a NoSQL data store, not just the indexing engine: NoSQL (MongoDB) vs Lucene (or Solr) as your database http://searchhub.org/2010/04/29/for-the-guardian-solr-is-the-new-database/

However, because Lucene only provides a "flat" document structure, where each field can be multi-value (scalar), I can't seem to fully understand how people are mapping complex structured data into Lucene for index and store. For example:

{
"firstName": "Joe",
"lastName": "Smith",
"addresses" : [
    {
        "type" : "home", 
        "line1" : "1 Main Street",
        "city" : "New York",
    },
    {
        "type" : "office",
        "line1" : "P.O. Box 1234",
        "zip:“10000”
    }
]
}

Things can obviously get more complex. I.e. what if the object has two collections: addresses and phone numbers? what if address itself has a collection?

I can think of two ways to map this two lucene "document":

  1. Create a stored but not indexed field to store a JSON/BSON version of the object, and then create other index but don't store fields for indexing/searching.

  2. Find a smart way to somehow fit the object into Lucene way of storing data. I.e. use dot notation to flat the fields, use multi-value fields to store individual collection value and then somehow recreate the object on its way back...

I wonder if people have dealt with similar problems before and what solution have you used?

Community
  • 1
  • 1
  • storing mutidimensianl array object is possible in ES. If u want to go with solr then u have to store multidimensional array as json_encoded string in field type - string ,with indexed=false and stored=true – Suhel Meman Apr 10 '13 at 04:46
  • I think EC just does 1. above? i.e. use _source field to store the actual JSON, no index; then use other mapped fields for index/search. – user2264053 Apr 10 '13 at 17:35
  • refer this for ES `http://www.elasticsearch.org/guide/reference/mapping/array-type/` – Suhel Meman Apr 11 '13 at 04:53

2 Answers2

0

Take a look at my Stupid Lucene Tricks: Hierarchies for one approach.

Mark Leighton Fisher
  • 5,609
  • 2
  • 18
  • 29
0

It depends what the usage is. If you only need them for display, you can the complex value (addresses) as a JSON string and store it as multiple value field, if you need to use them as index, you can choose following struture:


    "addresses_type": [
    "home",
    "office"
    ],
    "addresses_line1": [
    "1 Main Street",
    "P.O. Box 1234"
    ],
    "addresses_city": [
    "New York",
    ""
    ],
    "addresses_zip": [
    "",
    "10000"
    ]

Ray Niu
  • 1
  • 2