We are trying to design a recommender system of documents in which documents are constantly being updated. Actually the documents are streams to which text usually gets appended.
Initially we planned to use lucene + solr. But that is good for mostly static documents.The way lucene updates a document is by deleting it first and then reindexing it. So if document is updated frequently above approach results in slower indexing as corpus size and average document size increases.
We were also tempted to build our own solution but gave up after prototyping as we were drifting towards re-inventing information retrieval functionalities which were already implemented quite well in lucene. Does any one has any experience of building this kind of system by integrating open source search and machine-learning tools.