2

I am using apache manifoldcf open source project for indexing documents from Google Drive into my solr. Often I have seen it is quite inconsistent in indexing the data. Also it takes time to reflect even small number of documents in solr . Do you really think its a good option to index Google Drive using it?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Saurabh Chaturvedi
  • 2,028
  • 2
  • 18
  • 39

2 Answers2

0

It is currently bit on slow side, due to response time and throttling constraints from google drive itself. But this limit can probably relieved if you buy additional bandwidth from google. With current setup if you are looking to index a large set of documents in google drive it may not be quick as you may expect

kuhajeyan
  • 10,727
  • 10
  • 46
  • 71
  • thanx, kbird can you tell me some other way through which i can index documents from google drive, that is not getting dependent on manifoldcf ?? – Saurabh Chaturvedi Jun 30 '15 at 15:03
  • @codechat if you are to use the api, the constraints would exists still. I am not sure about any other way than using api – kuhajeyan Jul 08 '15 at 11:50
0

Manifold CF is good for crawling through file-system. You can go for Apache Nutch if you are interested in web crawling.

Yes ManifoldCF does take a lot of time to reflect a small number of document. Also it has very less documentation. Although, you can join the mailing list where you can ask questions to the lead developer "Karl". He is very helpful and usually answers withing a few hours.

P.S. :I have worked using ManifoldCF over a project for a span of 10 months.

Shashank Raj
  • 25
  • 1
  • 12