4

Suppose I have the following table:

`title`
- id
- name
- tv_series_id

Example:
id=1, name="Episode 2", tv_series_id=4

I can easily map this in ElasticSearch using the following JSON structure:

{
    "ID": 1,
    "Name": "Episode 2",
    "TVSeriesID": 4
}

If I then had a second table called tv_series that the title table referenced in a foreign key, for example:

`tv_series`
- id
- name

Example:
id=4, name='Friends'

How would I then map that relationship in Elasticsearch? Is there a generic way in which two tables with one or more foreign key relationships can be mapped in Elasticsearch? For example, by doing a certain join statement?

Val
  • 207,596
  • 13
  • 358
  • 360
David542
  • 104,438
  • 178
  • 489
  • 842

1 Answers1

8

In the SQL world, everything is about keeping relationships between tables in such a way that data is never repeated twice (or as seldom as possible), hence the primary-key/foreign-key approach.

In the NoSQL world in general, and in Elasticsearch in particular, there are no explicit relationships between your indices and types. There are ways to create relationships between documents themselves (e.g. parent-child or nested objects), but there is no concept of JOIN.

Instead, the main idea is to denormalize your data in such a way that your documents will be "usable" to carry out your specific use cases. One should not be afraid of keeping redundant copies of data. More generally, you need to ask yourself the following two questions (among many others):

  1. what data/fields do you need to display to your users?
  2. what queries do you need to perform to retrieve the above data?

In the simple case you highlighted, I would definitely go with a document that contains the JOIN of your two tables:

{
    "ID": 1,
    "Name": "Episode 2",
    "TVSeriesID": 4,
    "TVSeriesName": "Friends"
}

It is no big deal that the title Friends will be contained in 236 documents, the idea is that you can retrieve any episode and it contains all the data you need to know about it.

Val
  • 207,596
  • 13
  • 358
  • 360
  • thanks for the response, so in this case I'd need to do a `LEFT OUTER JOIN`, in case an item does not have a `tv_series_id`. Otherwise it'd exclude that item from the results if we did an `INNER JOIN`. Or should I be doing another type of join? – David542 Oct 23 '15 at 17:30
  • Sure, the type of JOIN clearly depends on your data model, if you think you can have episodes not "belonging" to a specific series, then you'll need and OUTER JOIN, otherwise if all episodes are related to one serie, then a LEFT JOIN is ok. The bottom line is that you need to find a way to denormalize your data. – Val Oct 23 '15 at 18:34
  • 1
    But in case when TVSeriesName gets updated. I would have to update the whole Episode document (the parent one). Otherwise new value will not appear here. And let's say there are thousands of such 'parent' documents, that would be an overkill approach. Is there a better way to achieve this? – Mikhail Kholodkov Dec 26 '17 at 19:17