I'd like to use Solr as the client-side search engine for published Tridion content. It'll probably be done as a RESTful service that is disconnected from the main application.
As we'll almost certainly be using boilerplate DD4T where everything is published to the Broker, I have some concerns particularly when indexing binaries such as PDF or Word files - sounds like there could be an awful lot of strain on the DB?
What strategy is recommended for retrieving binaries and indexing them in this way? It sounds like it's going to be more difficult than if we had binary items external to the DB?