3

I would try to put an xml document on SOLR (now i'm using 7.3.0 version) without set specific fields in data-config or putting one tag to get all the others. I tried with schemaless mode but I didn't get any document back. Is it possible to do this thing in some way, or SOLR can't handle it?

This is an example of my SOLR document.xml. I would like to detect all tags and getting back relative values without edit any fields. Like i said, i tried with schemaless mode and it didn't work.

<?xml version="1.0" encoding="UTF-8"?>
<digital_archive xmlns="https://www.site" dataCreazione="2017-05-11T17:15:00">
<DocumentalCategory>some data</DocumentalCategory>
<customer>some data</customer>
<producer>some data</producer>
<documentOwner>some data</documentOwner>
<sources>
    <source>
        <idc>
            <id scheme="adfr">some data</id>
            <name>some data</name>
            <path>sources\source\some_path.XML</path>
            <hash alg="SHA-256">3748738</hash>
        </idc>
        <vdc>
            <id scheme="some data">some data.XML</id>
            <timeReference>2017-03-17T14:19:01+0100</timeReference>
        </vdc>
    </source>
</sources>
<ud>
    <metadati>
        <Name>Jane</Name>
        <Surname>Doe</Surname>
        <FiscalCode>dsrsd6w7hedw</FiscalCode>
        <Date>29.10.2017</Date>
    </metadati>

The result that i expect is something like this:

    <field name="DocumentalCategory">some data</DocumentalCategory>
<field name="customer">some data</customer>
<field name="producer">some data</producer>
<field name="documentOwner">some data</documentOwner>
<field name="sources">
    <field name="source">
        <field name="idc">
            <field name="id" scheme="adfr">some data</id>
            <field name="name">some data</name>
            <field name="path">sources\source\some_path.XML</path>
  • How do you want to handle the XML structure? – MatsLindh May 16 '18 at 18:57
  • I just edit my question. – Marko_Da-Miami May 17 '18 at 09:00
  • could you try curl http://localhost:8983/solr/collectionName/schema/fields after you run indexing – Mysterion May 17 '18 at 10:22
  • This is my Json response: { "status":0, "QTime":4}, "fields":[{ "name":"_root_", "type":"string", "docValues":false, "indexed":true, "stored":false}, { "name":"_text_", "type":"text_general", "multiValued":true, "indexed":true, "stored":false}, { "name":"_version_", "type":"plong", "indexed":false, "stored":false}, { "name":"id", "type":"string", "multiValued":false, "indexed":true, "required":true, "stored":true}]} – Marko_Da-Miami May 18 '18 at 07:38
  • When i post the xml file, Solr take it, but the response is empty. – Marko_Da-Miami May 18 '18 at 07:47

2 Answers2

2

Solr is not a database, it is a search engine. Its goal is to give you good search results with preservation of original structure being less important.

While there are some ways to take in nested documents, you will find that your searches afterwards will make you really rethink your import process.

So, I would recommend you step back and think about how you would want to find this information first and what level record/subrecord would be returned. Then you can revisit the import question.

Schemaless mode is not going to help you here, as it still expects your document to be in Solr format, whether XML, JSON or CSV. You have a custom XML format here. So, you need to transform it somehow. You can either use Data Import Handler and define the mapping or XSLT transform on the way in to make it match Solr's expectations. Either way, you would have to do some flattening and id mapping, most likely.

Alexandre Rafalovitch
  • 9,709
  • 1
  • 24
  • 27
  • 1
    I solved using SolrJ on Java. The Solr refguide's following part help me to do that : https://lucene.apache.org/solr/guide/7_0/using-solrj.html#uploading-content-in-xml-or-binary-formats – Marko_Da-Miami May 18 '18 at 14:00
1

Use xslt to convert you custom xml to solr xml understandable schema Below is my xml:-

 <?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="Rule.xsl"?>
<collection>
  <movie>
      <title>abc</title>
      <year>2016</year>
      <genre>comedy</genre>
  </movie>
  <movie>
      <title>xyz</title>
      <year>2017</year>
      <genre>animated</genre>
  </movie>
  <movie>
      <title>pqr</title>
      <year>2018</year>
      <genre>action</genre>
  </movie>
</collection>

Below is my xsl file to perform transformation:-

    <?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match='/collection'>
    <add>
      <xsl:apply-templates select="movie"/>
    </add>
  </xsl:template>
  <!-- Ignore score (makes no sense to index) -->
  <xsl:template match="movie/*[@name='score']" priority="100"></xsl:template>
  <xsl:template match="movie">
    <xsl:variable name="pos" select="position()"/>
    <doc>
      <xsl:apply-templates>
        <xsl:with-param name="pos"><xsl:value-of select="$pos"/></xsl:with-param>
      </xsl:apply-templates>
    </doc>
  </xsl:template>
  <!-- Flatten arrays to duplicate field lines -->
  <xsl:template match="movie/arr" priority="100">
    <xsl:variable name="fn" select="@name"/>
    <xsl:for-each select="*">
      <xsl:element name="field">
        <xsl:attribute name="name"><xsl:value-of select="$fn"/></xsl:attribute>
        <xsl:value-of select="."/>
      </xsl:element>
    </xsl:for-each>
  </xsl:template>
  <xsl:template match="movie/*">
    <xsl:variable name="fn" select="@name"/>
      <xsl:element name="field">
        <xsl:attribute name="name"><xsl:value-of select="local-name()"/></xsl:attribute>
      <xsl:value-of select="."/>
    </xsl:element>
  </xsl:template>
  <xsl:template match="*"/>
</xsl:stylesheet>

Transformed version:

    <add>
   <doc>
      <field name="title">abc</field>
      <field name="year">2016</field>
      <field name="genre">comedy</field>
  </doc>
   <doc>
      <field name="title">xyz</field>
      <field name="year">2017</field>
      <field name="genre">animated</field>
  </doc>
   <doc>
      <field name="title">pqr</field>
      <field name="year">2018</field>
      <field name="genre">action</field>
  </doc>
</add>

online xslt url:enter link description here

Akoffice
  • 341
  • 2
  • 6