2

I have seen many similar question to this topic (including this one, which talks about how ElasticSearch version 6 has overcome many of its limitations as the primary data store), but I am still not clear on the following:

I am creating an online shopping website and I am using MySQL as my DB.

This is a simplified version of my DB (Users can post Product on the website for sale)

enter image description here

I am learning about ElasticSearch and I want to use it to search the products on my website. I don't need User and ProductReview to be searched - only Product table.

I can think of 2 solutions to achieve this:

  1. Periodically copy Product table from MySQL to ES
  2. Keep User and ProductReview in MySQL and Product in ES

As far as I know, if I use option 1, then I can use go-mysql-elasticsearch to sync ES with MySQL: Is this a good solution?

I am more tempted to use option 2, as it is easier and I don't need to worry about data synchronization. What concerns me about this option is:

  • Is ES reliable to be the primary source of data?
  • At some point in time, if I have to modify the Product table structure, would I be able to do so without deleting and recreating the Product Index?
  • In case of MySQL, I normally take a backup of Prod DB and Restore it on Test DB... Is it still possible to do a Backup and Restore from Prod to Test using ES?

I have no experience with ES/NoSQL and would appreciate any advice.

Hooman Bahreini
  • 14,480
  • 11
  • 70
  • 137
  • Response from ElasticSearch community: https://discuss.elastic.co/t/using-elasticsearch-as-the-primary-and-only-storage-for-some-of-my-tables/126553/2 – Hooman Bahreini Apr 03 '18 at 22:05

2 Answers2

10

Let me start by stating that Elasticsearch is NOT a database, in the strict sense of the term, and should ideally not be used as such. However, nothing prevents you from doing it (and many people are doing it) and according to the good folks at Elastic, they won't ever strive to try and make ES a real database. The main goal of ES is to be a fast and reliable search and analytics engine, period.

If you can, you should always keep another primary source of truth from which you can easily (re-)build your ES indices anytime if something goes south.

In your case, option 1 seems to be the way to go since all you want to do is to allow users to search your products, so there's no point in synching the other tables in ES.

Option 2 sounds appealing, but only if you decide to go only with ES, which you really shouldn't if you want to rely on transactions (ES doesn't have transactional support). Another thing you need to know is that if you only have your data in ES and your index gets corrupted for some reason (during an upgrade, a bug in ES, a bug in your code, etc), your data is gone and your business will suffer.

So to answer your questions more precisely:

  1. ES can be reliable as a primary source of truth provided you throw enough efforts and money into the game. However, you probably don't have millions of products and users (yet), so having a HA cluster with minimum three nodes to search a few thousands products with a few fields doesn't seem like a good spend.

  2. When your products table changes, it is easy to reindex the table into ES (or even in real time) and if you have a few thousand products, it can go fast enough that you don't really have to worry about it. If the synch fails for some reason, you can run the process again without wasting too much time. With the zero-downtime alias technique, you can do it without impacting your users.

  3. ES also provides snapshot/restore capabilities so that you can take a snapshot of PROD and install it in your TEST cluster with a single REST call.

Val
  • 207,596
  • 13
  • 358
  • 360
  • Thanks a lot Val. So it make sense to use go-mysql-elasticsearch to sync just Product table with ES? – Hooman Bahreini Apr 03 '18 at 12:42
  • 1
    Yes, in your case, you don't want to take the risk of losing user data and running out of business just because you want to provide a product search feature to your users. – Val Apr 03 '18 at 12:44
  • Also, he won't be able to join Products using a simple Mysql join with the Users which might definitely be required. It would be clunky and slower to implement that using a combination of Mysql + Elasticsearch. – Archit Saxena Apr 04 '18 at 09:08
  • 1
    @ArchitSaxena In the solution I suggested, he still has all the data in MySQL (long-term primary source of truth), so he can still join whatever he wishes to. In ES, he only has the product data, i.e. what he wants to be able to search. – Val Apr 04 '18 at 09:36
  • 1
    Yes, apologies for any misunderstanding. I just wanted to emphasize the point to the OP that he won't be able to get the joins according to his option 2. (Keep User and ProductReview in MySQL and Product in ES) – Archit Saxena Apr 05 '18 at 14:29
0

There are many approaches to solves this problem... this is what I ended up doing:

I took option 1 and built a Synchronizer to copy my products to Elasticsearch periodically. It was actually quite simple... I implemented the method explained here: How To Synchronize a Database With ElasticSearch using Elasticsearch NEST Client

Hooman Bahreini
  • 14,480
  • 11
  • 70
  • 137