So my API doesn't even query the DB and return an array of documents
directly from ES?
Yes, As you are doing query to elasticsearch, you will get result only from Elasticsearch. Another way is, just get id from Elasticsearch and use id to retrive documeents from MySQL, but this might impact response time.
Is this considered good practice? replicating the whole table to
elastic? What about table relations... What If I want to filter by
nested table relation?...
It is not about good practice or bad practice, it is all about what type of functionality and use case you want to implement and based on that technology stack can be used and data can be duplicated. There is lots of company using Elasticsearch as secondary
data source where they have duplicated data just because there usecase is best fit with Elasticsearh or other NoSQL db.
Elasticsearch is NoSQL DB and it is not mantain any relationship between data. Hence, you need to denormalize your data before indexing to the Elasticsearch. You can read this article for more about denormalizetion and why it is required.
ElasticSearch provide Nested and Join data type for parent child relationship but both have some limitation and performance impact.
Below is what they have mentioned for join
field type:
The join field shouldn’t be used like joins in a relation database. In
Elasticsearch the key to good performance is to de-normalize your data
into documents. Each join field, has_child
or has_parent
query adds a
significant tax to your query performance. It can also trigger global
ordinals to be built.
Below is what they have mentioned for nested
field type:
When ingesting key-value pairs with a large, arbitrary set of keys,
you might consider modeling each key-value pair as its own nested
document with key
and value
fields. Instead, consider using the
flattened
data type, which maps an entire object as a single field and
allows for simple searches over its contents. Nested documents and
queries are typically expensive, so using the flattened
data type for
this use case is a better option.
most articles I read suggest replicating the table records (using
logstash pipe or other methods) to elastic index.
Yes, You can use logstash
or any language client like java
, python
etc, to sync data from DB to Elasticsearch. You can check this SO answer for more information on this.
Your Search Requirements
If you go ahead with Elasticsearch then you can use N-Gram Tokenizer or Regex Query and achieve your search requirements.