Warning: Solr newbie here
I have a PostgreSQL database with multiple schemas. Each schema contains several tables. The schemas generally differ in table count and the tables don't have necessarily much in common either.
Regarding the sizing:
- In total I'll have dozens up to few hundreds of schemas
- Each schema may contain ~10 tables
- Each table contains tens of thousands of rows X several hundreds columns
I want to make a full-text search in the data across the DB schemas. As a result I want to be able to identify the schema, table and column matching the search term.
To what I'm thinking about on my own so far are few SOLR options (I would be glad for recommendations on ES solution as well):
1. Create a multi-core where each core represents table within DB schema
This will nicely reuse the DB schema but I'll end up having several thousands of Cores which I believe is misuse.
2. Flattern the index so it covers all possible values
I was thinking about super-generic Solr schema which basically navigates down to individual cell in the table. Something similar to:
<fields>
<field name="schema" ... />
<field name="table" ... />
<field name="column" ... />
<field name="row" ... />
<field name="value" ... />
</fields>
In this case I feel I may be misusing the whole concept a bit as the value would have to be of some generic type (String?) which smells already :)
Also the ID would need to be composed but this I understand should be possible.
Any insight to the problem or recommendation to which part of the documentation to focus on will be much appreciated.