3

I am having serious problems to configure the Solr 4.10.3 DIH to import XML files. Been trying for hours, but no luck. Here is my configuration:

<dataConfig>
  <dataSource encoding="UTF-8" 
    type="FileDataSource" basePath="/path/to/my/cores/root/myCoreName/"/>
  <document>
    <entity
        name="pickupdir"
        processor="FileListEntityProcessor"
        rootEntity="false"
        fileName=".*\.xml"
        baseDir="/import"
        recursive="true"
        newerThan="${dataimporter.last_index_time}"
    />

    <entity 
        name="xml"
        processor="XPathEntityProcessor"
        datasource="pickupdir"
        stream="true"
        useSolrAddSchema="true"
        url="${pickupdir.fileAbsolutePath}"
        xsl="solr.xsl"
    />
  </document>
</dataConfig>

The XSLT "solr.xls" transforms the XML files to the Solr import format, so I've set useSolrAddSchema="true". However, when I try to run this dataimport from the Browser Admin console, I keep getting the error:

java.io.FileNotFoundException: Could not find file:  (resolved to: /path/to/my/cores/root/myCoreName/

A few things are not clear to me here:

  • The error msg it doesn't say which file it was looking for exactly.
  • Why does it say "could not find file" when it is looking for a directory?
  • If I understand the "basePath" attribute of dataSource correctly, this will be the basis for resolving relative paths given in the entity element. So, the baseDir "/import" would get resolved to "/path/to/my/cores/root/myCoreName/import". But this doesn't seem to be happening correctly.
  • How would I configure the paths to use relative paths to the solr root instead of absolute paths?

Maybe someone can point me to some working examples for XML imports using XSLT and DIH. I would like to stick with the XSLT, because that's working already (I've tested the import before with the Simple Post Tool).

Cheers,

Martin

martin_wun
  • 1,599
  • 1
  • 15
  • 33

1 Answers1

0

As per the documentation, try adding dataSource="null" attribute to the outer entity. Without that attribute, it picks up the first Data Source declared, which is your FileDataSource.

You also seem to have forgotten to close the second entity.

Alexandre Rafalovitch
  • 9,709
  • 1
  • 24
  • 27
  • Thanks a lot, Alexandre, for the hint. I have tried to use the attribute dataSource = "null" and also made the second entity "xml" a nested entity under the first. Although, I am not quite sure what the purpose of this nesting is, since the entities refer to each other by their names. In any case, I am still getting the dreaded FileNotFoundException... – martin_wun Feb 11 '15 at 22:06
  • Double check it is dataSource in the outer entity. Also, the second entity says "datasource" not dataSource, but DIH is case-sensitive. Finally, the second entity dataSource is not a name of the first entity. It should actually be the name of that data source you actually defined at the top. Except you didn't give it a name. So, either give your data source a name and use that in the reference or - since you only have one data source - remove the reference from the second entity all together. And you do need to nest the entities, otherwise they create separate documents. – Alexandre Rafalovitch Feb 12 '15 at 14:18