I have a DIH configuration where I want to combine data from DB and Tika, by passing the filename from db to Tika. Problem is that filename in Tika is coming as empty. Logs say:
ERROR (Thread-16) [ ] o.a.s.h.d.DataImporter Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: (resolved to: C:\Users\jimbo\Desktop\solr-8.9.0\server\.
My configuration xml file is this:
<dataConfig>
<dataSource name="ds-db" driver="org.mariadb.jdbc.Driver" url="jdbc:mysql://localhost:3306/eepyakm?user=root" user="root" password="wpadmin"/>
<dataSource name="ds-file" type="BinFileDataSource"/>
<document>
<entity name="supplier" query="select * from suppliers_tmp_view" dataSource="ds-db"
deltaQuery="select id from suppliers_tmp_view where last_modified > '${dataimporter.last_index_time}'"
deltaImportQuery="select * from suppliers_tmp_view where id='${dataimporter.delta.id}'">
<entity name="attachment" dataSource="ds-db"
query="select * from suppliers_tmp_files_view where supplier_tmp_id='${supplier.id}'"
deltaQuery="select id,supplier_tmp_id from suppliers_tmp_files_view where last_modified > '${dataimporter.last_index_time}'"
parentDeltaQuery="select id from suppliers_tmp_view where id='${attachment.supplier_tmp_id}'">
<field name="path" column="path"/>
<entity name="file" processor="TikaEntityProcessor" url="${attachment.path}" format="text" dataSource="ds-file">
<field column="text"/>
</entity>
</entity>
</entity>
</document>
</dataConfig>
I found a similar problem at a very old post: Solr's TikaEntityProcessor not working