2

I need to create a document store with search capabilities. Sounds simple... That means that I have documents which I need to store in database. I thought about CouchDB, and about few other document-oriented databases, but I'm still not sure what would be the best solution.

On the other side, I thought about integrating Solr in some kind of web application which I'm going to use for uploading, indexing, search, update, delete documents. And, of course, the main problem is that most of these documents are written using Cyrillic characters.

Maybe I'm trying to combine things that do not match together. Could someone give me an advice what would be the best way to implement solution like this.

Best, Joksimovic

Srecko
  • 199
  • 4
  • 14

3 Answers3

0

Looks like for your needs Thinking sphinx could help. You could store documents in any database(SQL-oriented or not) and search them with sphinx. Sphinx supports cyrillic characters from the box, also it's possible to use stemming, faceted search, fuzzy search, etc. May be it helps you.

Read more about sphinx here

Sergei Lomakov
  • 2,021
  • 19
  • 18
  • Thank you! I thought about sphinx, but my first choice was Solr. The most important thing is to access from Java application, and that is possible to comunicate with any document-oriented NoSQL database... – Srecko Jan 26 '12 at 17:02
  • Reading posts all you guys suggested, I think that Solr is the right thing, because I need to call API from my web application. The question is, what do I have to do to make it work with characterset I need Cyrillic or Latin (Serbian, Croatian, Slovenian). – Srecko Jan 26 '12 at 17:18
0

I am also working on such a content management system. Utill now i am going to use a database to store the metadata. Store the documents on file system. Dont go for storing documents in database like SQL server. since it has a limitation and licensing cost.For search you can use Solr (better in terms of support and acceptance in open source over sphinx)

Choosing a stand-alone full-text search server: Sphinx or SOLR?

. either way you need to populate indexes. then call API methods to search.

Community
  • 1
  • 1
Usama Khalil
  • 436
  • 1
  • 4
  • 14
  • Thank you! I don't want to use SQL, MySQL or similar RDBMS. I want to try with NoSQL document-oriented database. The main challenge is Cyrillic characterset... – Srecko Jan 26 '12 at 17:05
0

Brate Srbine/Crnogorče :)
I suggest you use MongoDB as your database and use Solr to get index/search capability.

I used Solr in my previous (government tender) project and it's GREAT.
No bugs, easy to use when you get into it and it's blindingly fast.

Marko Bonaci
  • 5,622
  • 2
  • 34
  • 55
  • Hvala prijatelju :) I don't know much about MongoDB. I thouht about NoSQL solution, and I will read about MongoDB. But, how did you solve this problem with Cyrillic characters? Could you provide me an example of indexing and searching documents with Cyrillic characters? – Srecko Jan 26 '12 at 16:55
  • I forgot to say... I could also use Latin with ć, č, š and other stuff :) that kind of solution could do the job. – Srecko Jan 26 '12 at 17:14
  • I'm not sure about cyrillic in mongo (http://comments.gmane.org/gmane.comp.db.mongodb.user/3119), but I know that Solr supports UTF-8, which supports Cyrillic, right? – Marko Bonaci Jan 27 '12 at 09:39
  • @mbonaci how did you keep mongo and solr in sync? how did you ensure that the data gets eventually consistent between them? what did you do during that period while they are not in sync (you delete a doc from mongo but it's stil in solr for example)? – milan Jan 27 '12 at 13:25
  • @mbonaci As I understand, Solr supports UTF-8, and probably Cyrillic, but I will have to test few things. I couldn't make it work. But still, what about database? Where to store documents? – Srecko Jan 27 '12 at 14:36
  • @milan I never used MongoDB in production, only Solr, so I cannot answer that from the top of my head, but I wrote DIH scheduler (see DIH wiki page) for RDB sync, I'm sure you could do the similar thing here. Or even better, when you insert your data in Mongo (e.g. through REST) insert it in Solr in parallel (which would solve your other two consistency problems). – Marko Bonaci Jan 27 '12 at 16:16
  • @Joksimovic you tried storing cyrillic in mongo and it did not work? – Marko Bonaci Jan 27 '12 at 16:17
  • No, I didn't. I was doing a research before I choose a database (storage). I tried indexing and searching with Solr, and I had a problem with Cyrillic. I think I have a solution for that part, now I wanted to find out which database is the best to use as document storage. Of course, these documents should be indexed and searched using Solr. Actually, that is my concern... – Srecko Jan 27 '12 at 17:46