0

I'm currently trying to create a small python program using SolrClient to index some files.

My need is that I want to index some file content and then add some attributes to enrich the document. I used the post command line tool to index the files. Then I use a python program trying to enrich documents, something like this:

doc = solr.get('collection', id)
doc['new_attribute'] = 'value'
solr.index_json('collection',json.dumps([doc]))
solr.commit(openSearcher=True)

Problem is that I have the feeling that we lost file content index. If I run a query with a word present in all attributes of the doc, I find it.

If I run a query with a word only in the file, it does not work (it works indexing only the file with post without my update tentative).

I'm not sure to understand how to update the doc keeping the index created by the post command.

I hope I'm clear enough, maybe I misunderstood the way it works...

thanks a lot

3 Answers3

2

It has worked for me in this way, it can be useful for someone

from SolrClient import SolrClient    
solrConect = SolrClient("http://xx.xx.xxx.xxx:8983/solr/")
doc = [{'id': 'my_id', 'count_related_like':{'set': 10}}]
solrConect.index_json("my_collection", json.dumps(doc) )
solrConect.commit("my_collection", softCommit=True)
Alex Hurtado
  • 349
  • 2
  • 7
1

If I understand correctly, you want to modify an existing record. You should be able to do something like this without using a solr.get:

doc = [{'id': 'value', 'new_attribute':{'set': 'value'}}]
solr.index_json('collection',json.dumps([doc]))

See also: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

  • 1
    Hi, I tried this as well, but no luck. Could it be because I'm using a schema with pre-defined fields ? When using post command some fields are not set and I try to set it through index_json afterwards – Rémi Chaffard Feb 08 '17 at 10:29
  • Is the new_attribute field defined in schema (It must be defined (statically or dynamically)? Also, is it multivalued or single value? If multivalued, the value should be in a list, or use 'add'. Also check the _version_ as described in the above link. I suggest trying to update with curl first to see if it works without using SolrClient. – James Doepp - pihentagyu Feb 08 '17 at 14:07
0

Trying with Curl did not change anything. I did it differently so now it works. Instead of adding the file with the post command and trying to modify it afterwards, I read the file in a string and index in a "content" field. It means every document is added in one shot.

The content field is defined as not stored, so I just index it.

It works fine and suits my needs. It's also more simple since it removes many attributes set by post command that I don't need.

If I find some time, I'll try again the partial update and update the post.

Thanks Rémi