2

I'm quite new to ElasticSearch, and I think I'm making some mistakes in my mind, on how it's supposed to work. I can't really find help through Google, and am not sure if it's because of me, or because ElasticSearch is still quite new.

We're an ecommerce company. We have a solid platform on which clients can manage and sell products. They can have more than one subplatform, and they can enable/disable products per subplatform.

So, each ElasticSearch filter (aggregation, facet, whatever the name is - I could really go for a ES dictionary) must filter on this subplatform ID by default. For Solr, I could look up what each document was supposed to look like, but no joy so far with ES.

I assume it would be something along the lines of

<doc>
  <field name="subplatforms">[1, 120, 360]</field>
  <field name="name">Product 1</field>
  <field name="categories">['Apparel', 'Shoes', 'Nike']</field>
</doc>

This is what an XML file in Solr is supposed to look like, but as ES doesn't have such things, I've just written it out like this.

To show filters for each selected category, the search would be something along the lines of:

curl -XPOST "http://localhost:9200/products/_search" -d'
{
  "size": 0, 
  "aggregations": {
    "filter": {
      "term": { "category": "Shoes"
      }
    }
  }
}'

Right? We wouldn't want to show buckets for categories, as that is done outside of ElasticSearch. But, we do want to show all aggregations, in buckets, for each possible selection in the selected category. For each product with the category 'Shoes', it should find all possible aggregations (how to define them?) like shoe size, shoe lace colour, shoe lace type (flat/round), etc etc.

I'm quite stuck, and none of the resources I've found helped me so far. Newbie documentation is really lacking.

Mave
  • 2,413
  • 3
  • 28
  • 54
  • **Facets** in elasticsearch have been replaced by **Aggregations** - so you should use these (as per your example). Have you got any samples of what you want to get back - is it simply to show the number of products / documents by type of category? e.g. for your sample doc, you'd want three buckets (apparel, shoes, nike) each showing the same product? – Olly Cruickshank Apr 16 '15 at 13:09
  • Well, yes. For the filtering it would be sufficient to show only the relevant buckets with the counts. As for actually showing the relevant products, that would be *another* search, with the selected aggregations in place, yes? – Mave Apr 16 '15 at 13:21
  • And to make it more complex, each product may also have x amount of attributes: be that shoe size (with values of 38, 39, 40), or shoe lace colour (one value, blue). But, product #2 may have attributes for shoe sizes 39, 40, 41, and shoe lace colour red. You'd want to show, in the filter list: Shoe size, with values 38 (1), 39 (2), 40 (2), 41 (1). And, Shoe lace colour with values Red (1), Blue (1). – Mave Apr 16 '15 at 13:23

1 Answers1

5

if you have the following documents:

curl -XPOST 'http://localhost/test/product' -d'
{
  "name": "Product 1",
  "categories":["Apparel", "Shoes", "Nike"],
  "shoesize":[38, 39, 40],
  "lacecolor": "blue"
 }'

curl -XPOST 'http://localhost/test/product' -d'
{
  "name": "Product 2",
  "categories":["Shoes"],
  "shoesize":[38, 39, 40, 41],
  "lacecolor": "red"
 }'  

Then to get an aggregation grouped first by category buckets and then shoesize and lace color buckets:

curl -XGET 'http://localhost/test/product/_search?pretty' -d '{
 "query": { "match_all": { } },
    "aggs" : {
        "category_agg" : { 
           "terms" : { "field" : "categories" } ,
           "aggs" : {
             "shoesize_agg" : { "terms" : { "field" : "shoesize" } },
             "lacecolor_agg" : { "terms" : { "field" : "lacecolor" } }
           }
        }
    }
}'

if you want to do filtering on the aggregation - e.g. because the user has searched for a particular term or selected a category I would put that criteria in the query statement (i.e. not the agg filter):

curl -XGET 'http://localhost/test/product/_search?pretty' -d '{
  "query": { 
    "filtered": {
      "query": {"query_string" :{"query" :"blue"}},
      "filter" : { "terms" : {"categories" : ["shoes"]} }}},
  "aggs" : {
        "category_agg" : { 
           "terms" : { "field" : "categories" } ,
           "aggs" : {
             "shoesize_agg" : { "terms" : { "field" : "shoesize" } },
             "lacecolor_agg" : { "terms" : { "field" : "lacecolor" } }
           }
        }
    }
}'
Olly Cruickshank
  • 6,120
  • 3
  • 33
  • 30
  • OK, I get the way they're built up now, but these would return the actual products, right? We're also looking for a way to build the filters themselves, without knowing what aggregations are possible. Users may enter their own attributes, so shoe size could be 'size of shoe', and other attributes could be 'suitable for' or 'you name it'. We simply don't know what attributes users have made, and it should be irrelevant. Is ElasticSearch capable of finding out what possible fields are there to be filtered on, outside the default defined? (say name, id, etc). And then, in turn, return products? – Mave Apr 16 '15 at 14:14
  • 1
    to avoid returning the documents add `"size" : 0,` to your query. I'm not totally clear what you're looking for - you do need to predefine the aggregation types (is this what you mean by a "filter"?) - e.g. your query needs to list the buckets that that you want to aggregate the data on. When a user selects a particular value (e.g. shoe size 10) then you can easily apply this to a filter on the documents returned. – Olly Cruickshank Apr 16 '15 at 14:24
  • So we must know what predefined buckets there are, and send those? There's no way to say to ES 'Hey, look, this product has x amount of fields, ignore the fields `product_id`, `product_name`, `categories`, and make aggregations of the rest of the fields, then return those as buckets? – Mave Apr 16 '15 at 14:33
  • 1
    I don't think there's any way to auto-generate the buckets for aggregation - you could extract the mapping (this shows the fields and their type) for each type and use the information in here to generate an aggregation query. – Olly Cruickshank Apr 16 '15 at 14:44
  • OK, it makes way more sense for us to send the supposed filters. All we need to do is assign possible filters to each category, no biggie. Client has disabled 'price' filtering for category X? OK, no problem, don't query that with ES. I'm in the process of writing down what exactly needs to be in ES, and how clients can group filters/attributes (lime green and green could be, don't have to be, placed under 'green'). You've helped me a lot by addressing the flaws in my thinking. Thank you! – Mave Apr 16 '15 at 14:57
  • If anyone needs to know how you can do this but for nested data types, then [this](https://stackoverflow.com/a/31866038/2963111) is a good resource. – harvzor Jul 18 '17 at 15:56