1

I am working on a project where the specification requires a parent - child relationship within the Solr data collection ... i.e. a user and the collection of languages they speak (each of which is made up of multiple data fields). My production system is a 4.10 Solr implementation but I have a 5.5 implementation as my disposal as well. Thus far, I am not getting this to work on either one and I have yet to find a complete documentation source on how to implement this.

The goal is to get a resulting document from Solr that looks like this:

{
    "id": 123,
    "firstName": "John",
    "lastName": "Doe",
    "languagesSpoken": [
        {
            "id": 243,
            "abbreviation": "en",
            "name": "English"
        },
        {
            "id": 442,
            "abbreviation": "fr",
            "name": "French"
        }
    ]
}

In my schema.xml, I have flatted out all of the fields as follows:

<field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="firstName" type="text_general" indexed="true" stored="true" />
<field name="lastName" type="text_general" indexed="true" stored="true" />
<field name="languagesSpoken" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="languagesSpoken_id" type="int" indexed="true" stored="true" />
<field name="languagesSpoken_abbreviation " type="text_general" indexed="true" stored="true" />
<field name="languagesSpoken_name" type="text_general" indexed="true" stored="true" />

The latest rendition of my db-data-config.xml looks like this:

<dataConfig>
    <dataSource driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:...." />
        <document name="clients">
            <entity name="client" query="SELECT * FROM clients" deltaImportQuery="SELECT * FROM clients WHERE id = ${dih.delta.id}" deltaQuery="SELECT id FROM clients WHERE updateDate > '${dih.last_index_time}'">

                <field column="id" name="id" />
                <field column="firstName" name="firstName" />
                <field column="lastName" name="lastName" />

                <entity name="languagesSpoken" child="true" query="SELECT id, abbreviation, name FROM languages WHERE clientId = ${client.id}">
                    <field name="languagesSpoken_id" column="id" />
                    <field name="languagesSpoken_abbreviation" column="abbreviation" />
                    <field name="languagesSpoken_name" column="name" />
                </entity>
            </entity>
        </document>
        ...

On the 4.10 server, when the data comes out of Solr, I get one flat document record with the fields for one language inline with the firstName and lastname like this:

{
    "id": 123,
    "firstName": "John",
    "lastName": "Doe",
    "languagesSpoken_id": 243,
    "languagesSpoken_abbreviation ": "en",
    "languagesSpoken_name": "English"
}

On the 5.5 server, when the data comes out, I get separate documents for the root client document and the child language documents with no relationship between them like this:

{
    "id": 123,
    "firstName": "John",
    "lastName": "Doe"
},
{
    "languagesSpoken_id": 243,
    "languagesSpoken_abbreviation": "en",
    "languagesSpoken_name": "English"
},
{
    "languagesSpoken_id": 442,
    "languagesSpoken_abbreviation": "fr",
    "languagesSpoken_name": "French"
}

I have spent several days now trying to figure out what is going on here to no avail. Can anybody provide me with a pointer as to what I am missing here?

Thanks, -- Jeff

Jeff
  • 227
  • 1
  • 4
  • 13
  • You can't use Nested Documents in DataImportHandler. – Oyeme Apr 14 '16 at 15:29
  • @Oyeme, there are multiple sources out there (i.e. https://issues.apache.org/jira/browse/SOLR-5147) that say that this is possible as of 5.1. I have not found a complete source of documentation though. – Jeff Apr 14 '16 at 16:38
  • 2
    You can add nested entities to the dih after 5.1, but since Solr is a flat document model, you won't get the documents out of the Solr index in nested form. Instead, you'll get them in "child1 child2 childn parent" format. You will need to manage relationships between data in your application. – TMBT Apr 14 '16 at 21:37

1 Answers1

0

You may want to flatten your json objects like below before you import into SOLR;

https://stackoverflow.com/a/19101235/929902

POST http://localhost:8983/solr/ggg_core/update?boost=1.0&commitWithin=1000&overwrite=true&wt=json HTTP/1.1

Then once you read from SOLR, you can unflatten it in similar way.

Teoman shipahi
  • 47,454
  • 15
  • 134
  • 158