0

My original purpose was to map multiple columns from a table to one field in solr, without knowing the exact names of the columns. This may offer a chance to map all query result columns getting from the entity query to one field, for example all columns with type text (or varchar).

I'm using the DataimportHandler to import data from an sql server. There is no problem in datasource configuration since a direct mapping using column name and dependcy field name works fine. For example using

    <entity name="foo" dataSource="my_database" query="select * from foo">
      <field column="db_column_name" name="solr_field_name"/>
    </entity>

After many hours of trying and searching I ended up defining an extra MappingTable containing all column names and fields the to map on like explained in Solr - DataImportHandler: When attempting to use column values as field names, multivalued fields only retain the first result

The MappingTable looks like

SOLR_FIELD    MY_FIELD
name          column_name
id            column_id
content       column_desc
content       column_second_desc

SourceTable looks like

column_id   column_name   column_desc
123         goose         this column doesnt quak
342         plain         this is another description

There is no foreign key defined between this tables. So again what I thought to do here is using the MappingTable to get the data out of every row from SourceTable and put it into solr fields without explicite naming the column names of Sourcetable in Code(possibility to create an extra table will still keep the process dynamic enough.

This example shows how I tried to realize the mapping

<entity name="source" dataSource="my_database" query="SELECT * FROM SourceTable">
  <entity name="mapping" query="select * from MappingTable" transformer="ClobTransformer" >
    <field name="${mapping.SOLR_FIELD}" column="MY_FIELD" clob="true" />
  </entity>
</entity>

As you may see this only returns me four documents that containing the words column_name, column_id, column_desc, column_second_desc as often as solr found rows(datasets) in my SourceTable.

After that I tried turning the terms around and use column="${mapping.MY_FIELD}" to dynamic get the column name from the MappingTable.

<entity name="mapping" query="select * from MappingTable" transformer="ClobTransformer" >
  <entity name="source" dataSource="my_database" query="SELECT * FROM SourceTable>
    <field name="${mapping.SOLR_FIELD}" column="${mapping.MY_FIELD}"/>
  </entity>
</entity>

But this doesn't work for me. Solr doesn't execute the /dataimport?command=full-index neither gives me a hint in the "Logging" section in the solr admin.

Maybe there is a possibility to realize this using an foreign key in the table. I also thought about approaching it using a ScriptTransformer and looked at the answer in Dynamic column names using DIH (DataImportHandler) but this shows only the way to realize a dynamic custom solr field and this assumed also that the user knows the source column names. So I tries a kind of like explained in In solr dih import two double in one location

<script><![CDATA[
    function catchColumns(row)        {
        var descriptions = new java.util.ArrayList();
        for (var i=0; i<row.length; i++) {
            var value = row[i];
            if (value.equals(String))
            {
                descriptions.add(row[i]);
            }
        }
        row.put("description",descriptions);
        return row;
    }       ]]></script>

Using simple entity definition

   <entity name="foo" 
           dataSource="my_database"
           transformer="script:catchColumns" 
           query="select * from foo">
      ...
    </entity>

But I guess the ScriptTransformer works only with named columns in field tags in the entity.

I hope someone maybe can find a mistake or offer me a better way.


Edit: there were missing some quotationmarks, this happened just in case of changing names and queries for the examples. Also I tried to fix some blury worded paragraphs.

Community
  • 1
  • 1
  • Wouldn't a `copyField source="*" dest="searchAll" />` the schema be a possible solution? You can also adjust this to just map any field ending with _t into a text field, etc.. – MatsLindh Feb 04 '16 at 12:02
  • Well this would work if I would have already filled fields in solr, but my problem starts earlier in process: I want to get all data out of a table in a database without naming each column. Something like – Charlowing Feb 04 '16 at 12:29
  • You might be able to solve it by having a field definition with a field card name ("*"), setting it to `stored="false"` and `indexed="false"` if allowed - which would mean that it neither indexes or stores the content, and then use a `copyField` directive to move everything into a common field? – MatsLindh Feb 04 '16 at 12:40

0 Answers0