3

I'd like to use Solr as the client-side search engine for published Tridion content. It'll probably be done as a RESTful service that is disconnected from the main application.

As we'll almost certainly be using boilerplate DD4T where everything is published to the Broker, I have some concerns particularly when indexing binaries such as PDF or Word files - sounds like there could be an awful lot of strain on the DB?

What strategy is recommended for retrieving binaries and indexing them in this way? It sounds like it's going to be more difficult than if we had binary items external to the DB?

mpaton
  • 789
  • 3
  • 14
  • 1
    Hi have you joined the private beta of the Tridion Stack Exchange site? http://tridion.stackexchange.com It looks like you have an area51 account – Rob Stevenson-Leggett Mar 11 '13 at 14:41
  • 1
    Great idea. I think this is another area DD4T is helpful - you can easily map your DD4T content fields to Solr field XML, and your custom Deployer, Storage Extension, or even Event System could push the XML to Solr. – robrtc Mar 11 '13 at 14:51

1 Answers1

4

We have made the decision to publish binaries to the filesystem - you just configure this in cd_storage_conf.xml with something like this:

<Publication Id="9" defaultStorageId="defaultdb" cached="true">
    <Item typeMapping="Binary" storageId="defaultFile" cached="true"/>
</Publication>

However, even if you do choose to publish binaries to the database, this should not impact your Solr index which will be completely seperate to the broker database. You will need to write something custom (Custom Deployer?) that pushes your data into your Solr index, and you can choose to ignore binaries for this.

Rob Stevenson-Leggett
  • 35,279
  • 21
  • 87
  • 141
  • Thanks Rob, just to be clear do you index your binaries with a separate process? For example, the customer has a large number of PDF files they would need indexing. I presume you could set up some sort of filesystem watcher that can look for binaries of a specified type and trigger the indexer? – mpaton Mar 11 '13 at 15:33
  • 1
    @mpaton Yes that's what I'm saying you could do - I've not actually done this with DD4T (yet). Just basing my answer on what can be acheived without Tridion anyway. There's a few posts about indexing PDFs on StackOverflow and the web: http://stackoverflow.com/questions/6694327/indexing-pdf-with-solr – Rob Stevenson-Leggett Mar 11 '13 at 17:55