4

So I have a normalized table with some data that I want to put into a Solr index, something akin to this

+----+--------------+--------------+---------+
| id |     name     |  attribute   | value   |
+----+--------------+--------------+---------+
|  1 | Apple        | color        | green   |
|  1 | Apple        | shape        | round   |
|  1 | Apple        | origin       | Belgium |
|  2 | Motorbike    | type         | fast    |
|  2 | Motorbike    | nr of wheels | 2       |
|  3 | Office chair | color        | grayish |
|  3 | Office chair | spins        | yes     |
+----+--------------+--------------+---------+

Now, I would prefer to have it to be indexed as one document per unique id (i.e. item). But then I will have to consolidate n attributes into a single document. To do this I would need to do some magic with my dataConfig. But how can I store and map n fields? Is this the right time to use dynamic fields?

Here is my current try. I'm pretty sure it's not valid.

<dataConfig>
    <dataSource 
        type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
        url="jdbc:mysql://localhost/mystuff"
        user="root" password="*****"/>

    <document name="doc">
        <entity name="t1" query="select * from item_to_id_table">
            <field name="id" column="id"/>
            <field name="name" column="name"/>
            <entity name="t2" query="select * from my_flat_table" 
                    cacheKey="t1.id"
                    cacheLookup="t2.id">

                <!-- alt 1 -->
                <field name="$(t2.attribute)" column="value" />
                <!-- alt 2 -->
                <entity name="properties" query="select property, value from t2" 
                        cacheKey="$(t2.attribute)"
                        cacheLookup="property">
                    <field name="$(properties.property)" column="value" />
                </entity>
            </entity>
        </entity>

    </document>
</dataConfig>

I'm pretty sure neither of the two alternatives are valid, I'll try them out soon unless I can figure out something better. Perhaps a script transform as a third alternative.

Is this use case reasonable to use with Solr?

worldsayshi
  • 1,788
  • 15
  • 31
  • **[This blog post](http://www.chrisumbel.com/article/solr_dataimporthandler_dih_scripttransformer)** looks very promising. – worldsayshi Oct 23 '12 at 12:23
  • [Somewhat related](http://stackoverflow.com/questions/7917317/dynamic-column-names-using-dih-dataimporthandler). – worldsayshi Oct 23 '12 at 13:43

1 Answers1

5

I solved this the way described here.

In short I used a script transform to turn the 'properties' entity rows into fields with a prefix, "p_". Somewhat like this (example code, there may be bugs):

<dataConfig>
    <dataSource 
        type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
        url="jdbc:mysql://localhost/mystuff"
        user="root" password="*****"/>

    <script>
    <![CDATA[ 
    function formProperty(row) { 
      var propName = row.get("property");
      var propVal = row.get("value");
      var fieldName = "p_" + propName;
      row.put(fieldName,propVal);
      return row; 
    } 
    ]]>
    </script> 

    <document name="doc">
        <entity name="t1" query="select * from item_to_id_table">
            <field name="id" column="id"/>
            <field name="name" column="name"/>
            <entity name="t2" 
                    query="select * from my_flat_table
                    where my_flat_table.id = ${t1.id}"
                    transformer="script:formProperty">
            </entity>
        </entity>
    </document>
</dataConfig>

I then mapped them into the solr schemata in schema.xml as dynamic fields

<dynamicField name="p_*" indexed="true" stored="true" type="string"/>
worldsayshi
  • 1,788
  • 15
  • 31
  • Please fix index="true" to indexed="true" and dynamicfield to dynamicField. – yAnTar Sep 09 '13 at 16:14
  • It was a long time ago so I don't remember the specifics but I should perhaps say that I ended up writing my own data importer for performance reasons. I had a large database to import and I simultaneously performed a lot of join operations on a quite large database. Making your own is probably not the way to go unless performance requires it. It *might* be that similar performance can be achieved with the data import handler. I tried but I couldn't figure it out at the time. – worldsayshi Sep 12 '13 at 08:35
  • I could probably make the source available if someone wants it although I should probably purge the repo history of passwords first. ;) – worldsayshi Sep 12 '13 at 08:42