0

OS: Windows Version: solr 7.7 Port: 8983

Currently, my data stored and return by a terms query looks like this: enter image description here

But I would like to see the full filname as well as just the file name. How would I go about doing this?

I'd like something such as this example (resourcename and a fileName field would be ideal): enter image description here

Currently, I am adding files using:

java -Durl=http://localhost:8983/solr/myCore/update/extract -jar example\exampledocs\post.jar "C:\foo\lots of stuff and subdirs\*"

So far, SOLR is working perfectly, but would only be VERY useful to me with the query results.

Thanks!


Updated screenshot enter image description here

The solrconfig.xml for this server is:

<?xml version="1.0" encoding="UTF-8" ?>
<config>


  <luceneMatchVersion>7.7.2</luceneMatchVersion>



  <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" />

  <lib dir="${solr.install.dir:../../../..}/contrib/clustering/lib/" regex=".*\.jar" />
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-clustering-\d.*\.jar" />

  <lib dir="${solr.install.dir:../../../..}/contrib/langid/lib/" regex=".*\.jar" />
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-langid-\d.*\.jar" />

  <lib dir="${solr.install.dir:../../../..}/contrib/velocity/lib" regex=".*\.jar" />
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-velocity-\d.*\.jar" />

  <dataDir>${solr.data.dir:}</dataDir>

  <directoryFactory name="DirectoryFactory"
                    class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>

  <codecFactory class="solr.SchemaCodecFactory"/>

  <indexConfig>
    <lockType>${solr.lock.type:native}</lockType>

  </indexConfig>



  <jmx />


  <updateHandler class="solr.DirectUpdateHandler2">

    <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
      <int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int>
    </updateLog>


    <autoCommit>
      <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>



    <autoSoftCommit>
      <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
    </autoSoftCommit>



  </updateHandler>

  <query>


    <maxBooleanClauses>${solr.max.booleanClauses:1024}</maxBooleanClauses>

    <filterCache class="solr.FastLRUCache"
                 size="512"
                 initialSize="512"
                 autowarmCount="0"/>


    <queryResultCache class="solr.LRUCache"
                      size="512"
                      initialSize="512"
                      autowarmCount="0"/>


    <documentCache class="solr.LRUCache"
                   size="512"
                   initialSize="512"
                   autowarmCount="0"/>


    <cache name="perSegFilter"
           class="solr.search.LRUCache"
           size="10"
           initialSize="0"
           autowarmCount="10"
           regenerator="solr.NoOpRegenerator" />




    <enableLazyFieldLoading>true</enableLazyFieldLoading>

    <queryResultWindowSize>20</queryResultWindowSize>


    <queryResultMaxDocsCached>200</queryResultMaxDocsCached>



    <listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries">

      </arr>
    </listener>
    <listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries">

      </arr>
    </listener>


    <useColdSearcher>false</useColdSearcher>

  </query>



  <requestDispatcher>


    <httpCaching never304="true" />

  </requestDispatcher>


  <requestHandler name="/select" class="solr.SearchHandler">

    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
       name="wt">xml</str> 
        -->
    </lst>

  </requestHandler>


  <requestHandler name="/query" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="wt">json</str>
      <str name="indent">true</str>
    </lst>
  </requestHandler>



  <requestHandler name="/browse" class="solr.SearchHandler" useParams="query,facets,velocity,browse">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
    </lst>
  </requestHandler>

  <initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
    <lst name="defaults">
      <str name="df">_text_</str>
    </lst>
  </initParams>


  <requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.meta">ignored_</str>
      <str name="fmap.content">_text_</str>
    </lst>
  </requestHandler>


  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

    <str name="queryAnalyzerFieldType">text_general</str>




    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">_text_</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>

      <str name="distanceMeasure">internal</str>

      <float name="accuracy">0.5</float>
      <!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
      <int name="maxEdits">2</int>

      <int name="minPrefix">1</int>

      <int name="maxInspections">5</int>

      <int name="minQueryLength">4</int>

      <float name="maxQueryFrequency">0.01</float>

      -->
    </lst>


  </searchComponent>


  <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">

      <str name="spellcheck.dictionary">default</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">5</str>
      <str name="spellcheck.maxResultsForSuggest">5</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">5</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>


  <searchComponent name="tvComponent" class="solr.TermVectorComponent"/>


  <requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <bool name="tv">true</bool>
    </lst>
    <arr name="last-components">
      <str>tvComponent</str>
    </arr>
  </requestHandler>




  <searchComponent name="terms" class="solr.TermsComponent"/>


  <requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <bool name="terms">true</bool>
      <bool name="distrib">false</bool>
    </lst>
    <arr name="components">
      <str>terms</str>
    </arr>
  </requestHandler>



  <searchComponent name="elevator" class="solr.QueryElevationComponent" >

    <str name="queryFieldType">string</str>
  </searchComponent>


  <requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
    </lst>
    <arr name="last-components">
      <str>elevator</str>
    </arr>
  </requestHandler>


  <searchComponent class="solr.HighlightComponent" name="highlight">
    <highlighting>


      <fragmenter name="gap"
                  default="true"
                  class="solr.highlight.GapFragmenter">
        <lst name="defaults">
          <int name="hl.fragsize">100</int>
        </lst>
      </fragmenter>


      <fragmenter name="regex"
                  class="solr.highlight.RegexFragmenter">
        <lst name="defaults">

          <int name="hl.fragsize">70</int>
          <!-- allow 50% slop on fragment sizes -->
          <float name="hl.regex.slop">0.5</float>

          <str name="hl.regex.pattern">[-\w ,/\n\&quot;&apos;]{20,200}</str>
        </lst>
      </fragmenter>


      <formatter name="html"
                 default="true"
                 class="solr.highlight.HtmlFormatter">
        <lst name="defaults">
          <str name="hl.simple.pre"><![CDATA[<em>]]></str>
          <str name="hl.simple.post"><![CDATA[</em>]]></str>
        </lst>
      </formatter>


      <encoder name="html"
               class="solr.highlight.HtmlEncoder" />


      <fragListBuilder name="simple"
                       class="solr.highlight.SimpleFragListBuilder"/>


      <fragListBuilder name="single"
                       class="solr.highlight.SingleFragListBuilder"/>


      <fragListBuilder name="weighted"
                       default="true"
                       class="solr.highlight.WeightedFragListBuilder"/>


      <fragmentsBuilder name="default"
                        default="true"
                        class="solr.highlight.ScoreOrderFragmentsBuilder">

      </fragmentsBuilder>


      <fragmentsBuilder name="colored"
                        class="solr.highlight.ScoreOrderFragmentsBuilder">
        <lst name="defaults">
          <str name="hl.tag.pre"><![CDATA[
               <b style="background:yellow">,<b style="background:lawgreen">,
               <b style="background:aquamarine">,<b style="background:magenta">,
               <b style="background:palegreen">,<b style="background:coral">,
               <b style="background:wheat">,<b style="background:khaki">,
               <b style="background:lime">,<b style="background:deepskyblue">]]></str>
          <str name="hl.tag.post"><![CDATA[</b>]]></str>
        </lst>
      </fragmentsBuilder>

      <boundaryScanner name="default"
                       default="true"
                       class="solr.highlight.SimpleBoundaryScanner">
        <lst name="defaults">
          <str name="hl.bs.maxScan">10</str>
          <str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>
        </lst>
      </boundaryScanner>

      <boundaryScanner name="breakIterator"
                       class="solr.highlight.BreakIteratorBoundaryScanner">
        <lst name="defaults">

          <str name="hl.bs.type">WORD</str>


          <str name="hl.bs.language">en</str>
          <str name="hl.bs.country">US</str>
        </lst>
      </boundaryScanner>
    </highlighting>
  </searchComponent>




  <updateProcessor class="solr.UUIDUpdateProcessorFactory" name="uuid"/>
  <updateProcessor class="solr.RemoveBlankFieldUpdateProcessorFactory" name="remove-blank"/>
  <updateProcessor class="solr.FieldNameMutatingUpdateProcessorFactory" name="field-name-mutating">
    <str name="pattern">[^\w-\.]</str>
    <str name="replacement">_</str>
  </updateProcessor>
  <updateProcessor class="solr.ParseBooleanFieldUpdateProcessorFactory" name="parse-boolean"/>
  <updateProcessor class="solr.ParseLongFieldUpdateProcessorFactory" name="parse-long"/>
  <updateProcessor class="solr.ParseDoubleFieldUpdateProcessorFactory" name="parse-double"/>
  <updateProcessor class="solr.ParseDateFieldUpdateProcessorFactory" name="parse-date">
    <arr name="format">
      <str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
      <str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str>
      <str>yyyy-MM-dd'T'HH:mm:ss.SSS</str>
      <str>yyyy-MM-dd'T'HH:mm:ss,SSS</str>
      <str>yyyy-MM-dd'T'HH:mm:ssZ</str>
      <str>yyyy-MM-dd'T'HH:mm:ss</str>
      <str>yyyy-MM-dd'T'HH:mmZ</str>
      <str>yyyy-MM-dd'T'HH:mm</str>
      <str>yyyy-MM-dd HH:mm:ss.SSSZ</str>
      <str>yyyy-MM-dd HH:mm:ss,SSSZ</str>
      <str>yyyy-MM-dd HH:mm:ss.SSS</str>
      <str>yyyy-MM-dd HH:mm:ss,SSS</str>
      <str>yyyy-MM-dd HH:mm:ssZ</str>
      <str>yyyy-MM-dd HH:mm:ss</str>
      <str>yyyy-MM-dd HH:mmZ</str>
      <str>yyyy-MM-dd HH:mm</str>
      <str>yyyy-MM-dd</str>
    </arr>
  </updateProcessor>
  <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory" name="add-schema-fields">
    <lst name="typeMapping">
      <str name="valueClass">java.lang.String</str>
      <str name="fieldType">text_general</str>
      <lst name="copyField">
        <str name="dest">*_str</str>
        <int name="maxChars">256</int>
      </lst>

      <bool name="default">true</bool>
    </lst>
    <lst name="typeMapping">
      <str name="valueClass">java.lang.Boolean</str>
      <str name="fieldType">booleans</str>
    </lst>
    <lst name="typeMapping">
      <str name="valueClass">java.util.Date</str>
      <str name="fieldType">pdates</str>
    </lst>
    <lst name="typeMapping">
      <str name="valueClass">java.lang.Long</str>
      <str name="valueClass">java.lang.Integer</str>
      <str name="fieldType">plongs</str>
    </lst>
    <lst name="typeMapping">
      <str name="valueClass">java.lang.Number</str>
      <str name="fieldType">pdoubles</str>
    </lst>
  </updateProcessor>


  <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}"
           processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
    <processor class="solr.LogUpdateProcessorFactory"/>
    <processor class="solr.DistributedUpdateProcessorFactory"/>
    <processor class="solr.RunUpdateProcessorFactory"/>
  </updateRequestProcessorChain>

  <queryResponseWriter name="json" class="solr.JSONResponseWriter">

    <str name="content-type">text/plain; charset=UTF-8</str>
  </queryResponseWriter>


  <queryResponseWriter name="velocity" class="solr.VelocityResponseWriter" startup="lazy">
    <str name="template.base.dir">${velocity.template.base.dir:}</str>
    <str name="solr.resource.loader.enabled">${velocity.solr.resource.loader.enabled:true}</str>
    <str name="params.resource.loader.enabled">${velocity.params.resource.loader.enabled:false}</str>
  </queryResponseWriter>


  <queryResponseWriter name="xslt" class="solr.XSLTResponseWriter">
    <int name="xsltCacheLifetimeSeconds">5</int>
  </queryResponseWriter>

For whoever stumbles upon this Q in the future, please see my other question to maybe help with your issue here: Winwdows solr post - Invalid UTF-8 middle byte 0xe3 (at char #10, byte #-1)

jason m
  • 6,519
  • 20
  • 69
  • 122
  • 1
    According [to the source both `resource.name` and `id`](https://github.com/apache/lucene-solr/blob/7c8635d600b671dc91db3ed72690c927b3889372/solr/core/src/java/org/apache/solr/util/SimplePostTool.java#L820) should be set to the absolute path of the file being posted. Make sure you have a field named `resource.name` defined and set as string. This is the behaviour shown in your second example by default. – MatsLindh Apr 24 '20 at 18:09
  • @MatsLindh So I should be seeing these by default? – jason m Apr 24 '20 at 18:33
  • Yes, that's the default behavior (and according to the source, it's been for five years at least). I just tested with 8.5.0, and the `_default` configuration gives me the complete path in the `id` field. Seems like you're getting a generated signature instead - have you made any changes to the configuration? What's your request handler definition for `/update/extract`? – MatsLindh Apr 24 '20 at 18:52
  • BTW: the `resource.name` parameter is used for content type detection, it's not submitted as a field. If you have a deduplication update chain, it'll override the submitted id. – MatsLindh Apr 24 '20 at 18:55
  • @MatsLindh I am using whatever is the default for `/update/extract`. I'm a novice in solr. The command is only what I wrote above. I just deleted and re-downloaded 7.7 (8.* does not work locally on corporate setup for some reason). On reloading 7.7, I get the x_tika_oringresourcename but not the resource.id. Updated screenshot above. – jason m Apr 24 '20 at 21:31
  • @MatsLindh also dropped in the solrconfig.xml which is in the core directory. – jason m Apr 24 '20 at 21:40
  • The field name should be only `id`, and the UUID processor chain shouldn't assign one if it's already present - you could try to remove `uuid` from the `processor` list given to `add-unknown-fields-to-the-schema` update chain. Seems you can also turn on more excessive logging for the urlconnection the post tool uses: https://stackoverflow.com/a/12339718/137650 - or try wireshark to see what's actually being submitted to Solr (it'll also show the complete URL if you use an non-existing core in your command IIRC). Look for what `literal.id` is set to in the URL in that case. – MatsLindh Apr 25 '20 at 05:59
  • @MatsLindh Is there a way to add a new field which could capture the filename as a test? I'm quite new to solr and even doing this would be an interesting exercise. – jason m Apr 25 '20 at 15:33
  • No, that wouldn't really depend on the field - it depends on what the client (i.e. `post.jar`) is sending (since the path is metadata that isn't part of the file itself, it has to be included by the client). If you write your own indexing code you're going to have full control over what you submit to the server, so that might be an option (and can then index the file name into whatever field you want). I'd probably try to debug what happens with post.jar instead (for example by compiling your own and hooking up a debugger). – MatsLindh Apr 25 '20 at 18:24

0 Answers0