0

Warning: Solr newbie here

I have a PostgreSQL database with multiple schemas. Each schema contains several tables. The schemas generally differ in table count and the tables don't have necessarily much in common either.

Regarding the sizing:

  • In total I'll have dozens up to few hundreds of schemas
  • Each schema may contain ~10 tables
  • Each table contains tens of thousands of rows X several hundreds columns

I want to make a full-text search in the data across the DB schemas. As a result I want to be able to identify the schema, table and column matching the search term.

To what I'm thinking about on my own so far are few SOLR options (I would be glad for recommendations on ES solution as well):

1. Create a multi-core where each core represents table within DB schema

This will nicely reuse the DB schema but I'll end up having several thousands of Cores which I believe is misuse.

2. Flattern the index so it covers all possible values

I was thinking about super-generic Solr schema which basically navigates down to individual cell in the table. Something similar to:

<fields>
  <field name="schema" ... /> 
  <field name="table" ... />
  <field name="column"  ... />
  <field name="row" ... />
  <field name="value"  ... />
</fields>

In this case I feel I may be misusing the whole concept a bit as the value would have to be of some generic type (String?) which smells already :)

Also the ID would need to be composed but this I understand should be possible.

Any insight to the problem or recommendation to which part of the documentation to focus on will be much appreciated.

Jan Zyka
  • 17,460
  • 16
  • 70
  • 118
  • Actually the type "String" for the value would be correct ES < 5.x (now text in > 5.x), what matters is the whole concept of mapping where you indicate how the value should be analyzed, tokenized, indexed and searched (and possibly other stuff), see this link: https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html As for Solr, I'm unfortunately not experienced enough to give pointers at the moment – Adonis Jun 07 '17 at 21:28
  • Full genericity (option 2), won't have performance at all and will make you have an overweighted solr core index (1.5X the size of all your datas) that will result in slow operations. Can you give us 3 example of search in order to try to help you? – jeorfevre Jun 08 '17 at 15:11
  • @jeorfevre: the idea is I'll search for a word (like "Jerusalem") and the response needs to navigate me to the schema -> table -> column where this term exists. Being able to identify even row is a plus but not strictly required. – Jan Zyka Jun 08 '17 at 19:30

1 Answers1

1

With the response :

@jeorfevre: the idea is I'll search for a word (like "Jerusalem") and the response needs to navigate me to the schema -> table -> column where this term exists. Being able to identify even row is a plus but not strictly required.

I will suggest you to go for a word indexing instead of rows / line full text.

<fields>
  <field name="schema" ... /> 
  <field name="table" ... />
  <field name="column"  ... />
  <field name="row" ... />
  <field name="myword"  ... />
</fields>

This will be a far better approch than the full text search upon X rows in solr. The fulltext is the slowest option in solr. This kind of search is the best option.

You type "Jerusalem", and it will return all the results matching it. What do you think of that. Better than fulltextsearch, this will hit directly 1 one word only.

jeorfevre
  • 2,286
  • 1
  • 17
  • 27
  • I should have mentioned that the cells probably need to be full-text searched as one common column is some "description" which may contain the searched keyword somewhere in the middle ... Like the cell contains "I was born in Jerusalem". – Jan Zyka Jun 09 '17 at 21:16
  • I understand, but you will have much better performances with this implementation that splits the words and write an independant tupple in solr. Also with this implementation you will have easy counts (jerusalem x found 50 times) and propose some results close to this search easyly. Go for that implementation. Fulltext search has no power as it is. Better use an implementation that matches your use. – jeorfevre Jun 10 '17 at 11:25
  • I will try that out. Thanks for sughestions! Will accept once tried :) – Jan Zyka Jun 11 '17 at 07:14