4

I'm trying to do nested sorting in Elasticsearch but so far didn't succeed.

My data structure:

{ "_id" : 1,
"authorList" : [
  {"lastName":"hawking", "firstName":"stephan"},
  {"lastName":"frey", "firstName":"richard"}
]
}

{ "_id" : 2,
"authorList" : [
  {"lastName":"roger", "firstName":"christina"},
  {"lastName":"freud", "firstName":"damian"}
]
}

I want to sort the documents according the first authors last name in the documents.

Used mapping:

"authorList" : { "type" : "nested", "properties" : {"lastName":{"type":"keyword"}}}

Sort using SearchRequestBuilder (JAVA):

    searchRequestBuilder.addSort(
SortBuilders.fieldSort("authorList.lastName")
.order(SortOrder.ASC)
.sortMode(SortMode.MIN)
.setNestedPath("authorList")
)

This works but doesn't give the wanted result (e.g. first "hawking" then "roger").

Did I missed something? Is there a way to indicate Elasticsearch to access index=0 of the array authorList? Is there any mapping / normalizer to index the first entry of the array separately?

Thomas
  • 97
  • 1
  • 1
  • 7

1 Answers1

7

Nested documents are not saved as a simple array or list. They are managed internally by Elasticsearch:

Elasticsearch is still fundamentally flat, but it manages the nested relation internally to give the appearance of nested hierarchy. When you create a nested document, Elasticsearch actually indexes two separate documents (root object and nested object), then relates the two internally. (more here)

I think you need to provide some additional information to elasticsearch that will be an indicator which author is the "primary/first" one. It is enough to put this additional field only to one author in a nested object (your mapping can stay as before), something like this:

{
    "authorList" : [
      {"lastName":"roger", "firstName":"christina", "authorOrder": 1},
      {"lastName":"freud", "firstName":"damian"}
    ]
},
{
    "authorList" : [
      {"lastName":"hawking", "firstName":"stephan", "authorOrder": 1},
      {"lastName":"adams", "firstName": "mark" }
      {"lastName":"frey", "firstName":"richard"}
    ]
},
{
    "authorList" : [
      {"lastName":"adams", "firstName":"monica", "authorOrder": 1},
      {"lastName":"adams", "firstName":"richard"}
    ]
}

Then the query could be:

{
  "query" : {
    "nested" : {
      "query" : {
        "bool" : {
          "must" : [
            {
              "match" : {
                "authorList.authorOrder" : 1
              }
            }
          ]
        }
      },
      "path" : "authorList"
    }
  },
  "sort" : [
    {
      "authorList.lastName" : {
        "order" : "asc",
        "nested_filter" : {
          "bool" : {
            "must" : [
              {
                "match" : {
                  "authorList.authorOrder" : 1
                }
              }
            ]
          }
        },
        "nested_path" : "authorList"
      }
    }
  ]
}

And with Java API:

QueryBuilder matchFirst = QueryBuilders.boolQuery()
        .must(QueryBuilders.matchQuery("authorList.authorOrder", 1));
QueryBuilder mainQuery = QueryBuilders.nestedQuery("authorList", matchFirst, ScoreMode.None);

SortBuilder sb = SortBuilders.fieldSort("authorList.lastName")
    .order(SortOrder.ASC)
    .setNestedPath("authorList")
    .setNestedFilter(matchFirst);

SearchRequestBuilder builder = client.prepareSearch("test")
        .setSize(50)
        .setQuery(mainQuery)
        .addSort(sb);

Note that SortBuilder has .setNestedFilter(matchAll) which means that sorting is based on authorList.lastName field but only of your "primary/first" nested elements. Without it, elasticsearch would first sort all nested documents, pick first element from ascending sorted list and based on this it would sort parent documents. So document with "Hawking" could be first as it has "Adams" last name.

Final result is:

"authorList" : [
      {"lastName":"adams", "firstName":"monica", "authorOrder": 1},
      {"lastName":"adams", "firstName":"richard"}
    ],
}
"authorList" : [
      {"lastName":"hawking", "firstName":"stephan", "authorOrder": 1},
      {"lastName":"adams", "firstName":"mark"},
      {"lastName":"frey", "firstName":"richard"}
    ]
},
{
    "authorList" : [
      {"lastName":"roger", "firstName":"christina", "authorOrder": 1},
      {"lastName":"freud", "firstName":"damian"}
    ]
}
Maarten
  • 4,643
  • 7
  • 37
  • 51
Joanna Mamczynska
  • 2,148
  • 16
  • 14
  • Ok, that would solve the problem. But if I have to introduce a new field, wouldn't it be easier just to create a field "firstAuthorLastName" instead, copying the value of the first array index? This would also simplify the query/sorting part. – Thomas Aug 08 '17 at 20:00
  • Yes, if you can rearrange your model that way, then it would be definitely easier to query your data. If a document could have e.g. `id`, `firstAuthorLastName` and nested list of `otherAuthors`, then sorting on top level field `firstAuthorLastName` (instead of nested) would be also faster. – Joanna Mamczynska Aug 08 '17 at 21:12