I am having serious problems to configure the Solr 4.10.3 DIH to import XML files. Been trying for hours, but no luck. Here is my configuration:
<dataConfig>
<dataSource encoding="UTF-8"
type="FileDataSource" basePath="/path/to/my/cores/root/myCoreName/"/>
<document>
<entity
name="pickupdir"
processor="FileListEntityProcessor"
rootEntity="false"
fileName=".*\.xml"
baseDir="/import"
recursive="true"
newerThan="${dataimporter.last_index_time}"
/>
<entity
name="xml"
processor="XPathEntityProcessor"
datasource="pickupdir"
stream="true"
useSolrAddSchema="true"
url="${pickupdir.fileAbsolutePath}"
xsl="solr.xsl"
/>
</document>
</dataConfig>
The XSLT "solr.xls" transforms the XML files to the Solr import format, so I've set useSolrAddSchema="true". However, when I try to run this dataimport from the Browser Admin console, I keep getting the error:
java.io.FileNotFoundException: Could not find file: (resolved to: /path/to/my/cores/root/myCoreName/
A few things are not clear to me here:
- The error msg it doesn't say which file it was looking for exactly.
- Why does it say "could not find file" when it is looking for a directory?
- If I understand the "basePath" attribute of dataSource correctly, this will be the basis for resolving relative paths given in the entity element. So, the baseDir "/import" would get resolved to "/path/to/my/cores/root/myCoreName/import". But this doesn't seem to be happening correctly.
- How would I configure the paths to use relative paths to the solr root instead of absolute paths?
Maybe someone can point me to some working examples for XML imports using XSLT and DIH. I would like to stick with the XSLT, because that's working already (I've tested the import before with the Simple Post Tool).
Cheers,
Martin