1

I'm working on a project where we have some legacy data in MySQL and now we want to deploy ES for better full text search.

We still want to use MySQL as the backend data storage because the current system is closely coupled with that.

It seems that most of the available solutions suggest syncing the data between the two, but this would result in storing all the documents twice in both ES and MySQL. Since some of the documents can be rather large, I'm wondering if there's a way to have only a single copy of the documents?

Thanks!

peidaqi
  • 673
  • 1
  • 7
  • 18

1 Answers1

1

Impossible. This is analogous to asking the following: if you have legacy data in an Excel spreadsheet, can I use a MySQL database to query the data without also storing it in MySQL?

Elasticsearch is not just an application layer that interprets userland queries and turns them into database queries, it is itself a database system (in fact, it can be used as your primary data store, though it's not recommended due to various drawbacks). Its search functionality fundamentally depends on how its own backing storage is organized. Elasticsearch cannot query other databases.

You should consider what portions of your data actually need to be stored in Elasticsearch, i.e. what fields need text completion. You will need to build a component which syncs that view of the data between Elasticsearch and your MySQL database.

Backgammon
  • 454
  • 5
  • 15
  • A tool like https://github.com/jprante/elasticsearch-jdbc might be able to automate the data-in pipeline to some extent, but you will have to consider how to update or delete existing data in Elasticsearch to match your MySQL data. – Backgammon May 09 '19 at 20:07
  • just to clear up one thing: ES is not a database system (and will never be), it is a search and analytics engine. Also, the tool you're linking to is obsolete and has not been updated in years. This might help, though: https://stackoverflow.com/a/34477639/4604579 + https://stackoverflow.com/a/33325963/4604579 – Val May 10 '19 at 06:18
  • I don't mean to suggest that you actually use it in the role of, say, MySQL, but it meets every reasonable list of criteria for what defines a database. – Backgammon May 10 '19 at 14:11
  • ...except the ACID ones when it comes to multiple bulk updates at the same time ;-) https://www.elastic.co/guide/en/elasticsearch/guide/current/concurrency-solutions.html – Val May 10 '19 at 14:13
  • Hey, non-relational databases without ACID are databases too. – Backgammon May 10 '19 at 14:19