4

I am using XMLUI (Mirage) on DSpace 6.2 and am trying to insert the "Most Downloaded Items" into the home page.

I have figured out the SOLR query for this, namely (in page-structure.xsl):

<xsl:variable name="statsURL">
    <xsl:text>http://localhost/solr/statistics</xsl:text>
</xsl:variable>
<xsl:apply-templates select="document(concat($statsURL,'/select?q=type:0+-isBot:true+statistics_type:view&amp;wt=xml&amp;indent=true&amp;facet=true&amp;facet.field=id&amp;facet.sort=count&amp;facet.limit=10'))" mode="mostdownloaded"/>

This query returns an xml document:

<response>
    +<result name="response" numFound="8" start="0"></result>
    -<lst name="facet_counts">
        <lst name="facet_queries"/>
        -<lst name="facet_fields">
            -<lst name="id">
                <int name="49b63c98-122c-40d4-9181-2ad4db8853c9">8</int>
                <int name="061c72a0-3edc-4e17-8f33-4e7f6ce4573a">0</int>
                <int name="0e124f85-4636-4eb5-85cb-2e4afd3e3ed0">0</int>
                <int name="19095190-9074-4a4a-bb59-abcb539c8c38">0</int>
                <int name="1e5350e0-83d9-4f26-bd76-e5d660254ee6">0</int>
                <int name="432038ee-a7d7-4c69-80c1-02641e105286">0</int>
                <int name="6b70eeea-be33-4489-8370-189ef041ba93">0</int>
                <int name="9a8cd24e-3d88-43fc-8e92-b4e2c6142fbc">0</int>
                <int name="bba37b59-7edc-453c-87d2-4039e432217b">0</int>
                <int name="cc78e683-9563-49df-b5cf-35d506b4a27d">0</int>
            </lst>
        </lst>
        <lst name="facet_dates"/>
        <lst name="facet_ranges"/>
        <lst name="facet_intervals"/>
    </lst>
</response>

I then match this to a template as in:

<xsl:template match="/response/lst/lst/lst/int" mode="most-downloaded">
<div class="most_downloaded">
    <xsl:value-of select="./@name"/>
</div>
<div class="downloaded_count">
    <xsl:value-of select="text()"/>
</div>
</xsl:template>

I expect to see 8 divs of class "most_downloaded", each containing the id of the item, interspersed with another 8 divs of class "downloaded_count" containing the actual value. I do see these divs, but above them, I get a dump of all of the XML text nodes. I think that this is happening due to my poor understanding of template matching.

My questions are:
i) Is my query to get the list of most downloaded items correct? I have tried to test this but did not receive positive results.
ii) What is the correct way to match the template? /response/lst/lst/lst/int just sounds wrong.
iii) How can I use the id (which I believe is the item uuid in the database) to get the mets.xml data via cocoon?
iv) Is there an easier way to do all of this?

Thanks for any help.

2 Answers2

1

i) Your query gets bitstream IDs, not the IDs of the owning item. For most downloaded items you'll want facet.field=owningItem, and possibly also an exclusion so you don't count thumbnails (something like &fq=bundleName:ORIGINAL - you'll need to adjust that if you have non standard bundle names).

ii) Looks good to me. You probably want something like <xsl:template match="*" mode="most-downloaded"> to suppress the random XML junk you're seeing.

iii) I think it'll be better to get the metadata from the Discovery Solr core rather than trying to obtain the mets.xml file. You may be able to do a Solr join to the discovery core and get the title (or whatever other metadata you want) from there all in one query, but I'm not sure that works with faceting. You could, in your template, make a query to the Discovery core for each ID to get what you're after (eg http://localhost:8080/solr/search/select?q=*:*&fq=search.id=[id-goes-here]&rows=1&fl=title).

iv) Depends on whether you think writing Java code is easier ;) I have solved much the same issue locally in a two-step process: (a) query solr once per day with a query much like yours and write the results to a (JSON) file; (b) write Java code for a Cocoon transformer that loads the item IDs from the file, looks up the corresponding item's title then puts that into the page in a useful format. Not sure whether your approach is any better/worse! Though my approach avoids having to query Solr in real time, which we found to be quite resource intense.

Just for reference, my query for the JSON file mentioned in (iv) is http://localhost:8080/solr/statistics/select?q=*:*&fq=-isBot:true&fq=type:0&fq=statistics_type:view&facet=true&facet.field=owningItem&facet.limit=5&indent=true&rows=0&fq=time:[NOW/DAY-7DAYS+TO+NOW/DAY]&facet.mincount=5&fq=bundleName:ORIGINAL&wt=json&omitHeader=true

  • get non-bot hits
  • hits for bitstreams (type 0)
  • where the statistics type is view (not workflow or whatever else there is)
  • we want them grouped by the corresponding item's ID
  • we only want 5
  • we want this to be indented (this is just for optics)
  • we want 0 rows of data (which would be in addition to the facets - we only care about facets)
  • we want the last 7 full days
  • we only want items whose files have been downloaded at least 5 times
  • we only want ORIGINAL bundle, not thumbnails etc
  • we want JSON format
  • we want to skip some of the Solr results stuff that we don't care about
schweerelos
  • 2,189
  • 2
  • 17
  • 25
  • Thank you @schweerelos
    i)You are 100% correct
    ii)Again, 100%
    iii)I used your second idea and have almost got that working. I will paste the code when I have finished
    iv)I prefer your approach. Much less overhead on solr. I will give that a try as well. Writing the Java is not a problem, but I assume that you mean that I need to create a new aspect for the theme. I struggle a lot with the DSpace documentation as it only seems to tell one what to do and not how to do it. Do you have any pointers to resources that can help?
    Much Appreciated.
    – Shaun Donovan Mar 01 '18 at 06:03
  • Glad you got it to work @ShaunDonovan! You can write your own aspect, and I have done that a few times, eg https://github.com/UoW-IRRs/DSpace-Aspects (the surfacecontent one is closes to your use case but pulls data from the search core not the stats one) I've found extra aspects appear to slow down tomcat when restarting, so recently I've switched to adding transformers to existing aspects. You just pull them in in sitemap.xmap. I don't have any better docs than the official DSpace docs and the old-but-still-relevant cocoon docs, sorry! – schweerelos Mar 01 '18 at 21:56
1

This is the code that finally worked:

<xsl:variable name="searchURL" select="confman:getProperty('discovery','search.server')"/>
<xsl:variable name="statsURL" select="confman:getProperty('solr-statistics.server')"/>   

.....

                    <xsl:if test="string-length($request-uri)=0">
                       <div class="downloaded-wrapper">
                          <xsl:apply-templates select="document(concat($statsURL,'/select?q=type:0+-isBot:true+statistics_type:view&amp;wt=xml&amp;indent=true&amp;facet=true&amp;facet.field=owningItem&amp;fq=bundleName:ORIGINAL&amp;facet.sort=count&amp;facet.limit=10'))" mode="most-downloaded"/>
                       </div>
                    </xsl:if>
                </xsl:otherwise>
            </xsl:choose>
            </div>
            <xsl:apply-templates select="//*[@pagination='masked']/@pagination" mode="external"/>
        </div>
    </xsl:template>

    <xsl:template match="/" mode="most-downloaded">
        <xsl:for-each select="/response/lst/lst/lst[@name='owningItem']/int">
            <div class="most_downloaded">
                <xsl:variable name="itemId">
                    <xsl:value-of select="./@name"/>
                </xsl:variable>
                <xsl:apply-templates select="document(concat($searchURL,'/select?q=*:*&amp;fl=title,handle&amp;wt=xml&amp;omitHeader=true&amp;indent=true&amp;fq=search.resourceid:',$itemId))" mode="itemMeta"/>
            </div>
            <div class="downloaded_count">
                <xsl:value-of select="text()"/>
            </div>
        </xsl:for-each>
    </xsl:template>

    <xsl:template match="/" mode="itemMeta">
        <xsl:variable name="mainURL" select="confman:getProperty('dspace.baseUrl')"/>
        <a>  
            <xsl:attribute name="href"><xsl:value-of select="concat($mainURL,'/handle/',/response/result/doc/str[@name='handle']/text())"/></xsl:attribute>
            <xsl:for-each select="/response/result/doc/arr[@name='title']/str">
                <xsl:value-of select="./text()"/>
            </xsl:for-each>
        </a> 
    </xsl:template>