4

I am doing a project to index questions and answers of a website in elasticsearch (version 6) for search purpose.

I have first thought of creating two indexes as shown below, one for questions and one for answers.

questions mapping:

{"mappings": {
"question": {
  "properties": {
    "title":{
        "type":"text"
    },
    "question": {
      "type":  "text"
    },
    "questionId":{
        "type":"keyword"
    }
  }
}
}
}

answers mapping:

{"mappings": {
    "answer": {
      "properties": {
        "answer":{
            "type":"text"
        },
        "answerId": {
          "type":  "keyword"
        },
        "questionId":{
            "type":"keyword"
        }
      }
    }
  }
}

I have used multimatch query along with term and top_hits aggregation to search the indexed Q&As (referred question).I used this method to remove the duplicates from the search results. As answers or the question itself of the same question can appear in the result. I only want one entry per question in the results. the problem I am facing is to paginate the results. there is no possible way to paginate aggregation in elasticsearch. It can only paginate hits not aggregations.

then I thought of saving the both question and answers in one document, answers in a Json array. the problem with this approach is that there is no clean way to add, remove, update a specific answer in a given question document. only way I found was using a groovy script (referred question). which is deprecated in elasticsearch v6 AFAIK.

Is there a better and clean way to design this ? Thanks.

Bhanuka Yd
  • 646
  • 8
  • 25
  • Can you elaborate on why the duplicate search results are coming, it will be helpful if you could share your query on ES. – Sunder R Jun 25 '18 at 16:42
  • 2
    the duplicate results occur when when multiple answers or the question itself is matched for a query, which finally points the the same question. – Bhanuka Yd Jun 26 '18 at 10:02
  • If you will be using the parent-child relation it will solve your duplication problem also, as you will be querying or aggregating using has_child or has_parent – Sunder R Jun 26 '18 at 12:50

1 Answers1

6

Parent-Child Relationship

Use the parent-child relationship. It is similar to the nested model, and allows association of one entity with another. You can associate one document type with another, in a one-to-many relationship. More information on here: https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html

Child documents can be added, changed, or deleted without affecting the parent nor other children. You can do pagination on the parent documents using the Scroll API. Child documents can be retrieved using the has_parent join.

The trade-off: you do not have to take care of duplicates and pagination problems, but parent-child queries can be 5 to 10 times slower than the equivalent nested query.

Your mapping can be like the following:

PUT /my-index
{
  "mappings": {
    "question": {
      "properties": {
        "title": {
          "type": "text"
        },
        "question": {
          "type": "text"
        },
        "questionId": {
          "type": "keyword"
        }
      }
    },
    "answer": {
      "_parent": {
        "type": "question"
      },
      "properties": {
        "answer": {
          "type": "text"
        },
        "answerId": {
          "type": "keyword"
        },
        "questionId": {
          "type": "keyword"
        }
      }
    }
  }
}
MrPerson
  • 29
  • 4
Sunder R
  • 1,074
  • 1
  • 7
  • 21