6

I have an index "test". Document structure is as shown below. Each document has an array of "tags". I am not able to figure out how to query this index to get top 10 most frequently occurring tags?

Also, what are the best practices one should follow if we have more than 2mil docs in this index?

{
    "_index" : "test",
    "_type" : "data",
    "_id" : "1412879673545024927_1373991666",
    "_score" : 1.0,
    "_source" : {
      "instagramuserid" : "1373991666",
      "likes_count" : 163,
      "@timestamp" : "2017-06-08T08:52:41.803Z",
      "post" : {
        "created_time" : "1482648403",
        "comments" : {
          "count" : 9
        },
        "user_has_liked" : true,
        "link" : "https://www.instagram.com/p/BObjpPMBWWf/",
        "caption" : {
          "created_time" : "1482648403",
          "from" : {
            "full_name" : "PARAMSahib ™",
            "profile_picture" : "https://scontent.cdninstagram.com/t51.2885-19/s150x150/12750236_1692144537739696_350427084_a.jpg",
            "id" : "1373991666",
            "username" : "parambanana"
          },
          "id" : "17845953787172829",
          "text" : "This feature talks about how to work pastels .\n\nDull gold pullover + saffron khadi kurta + baby pink pants + Deep purple patka and white sneakers - Perfect colours for a Happy sunday christmas morning . \n#paramsahib #men #menswear #mensfashion #mensfashionblog #mensfashionblogger #menswearofficial #menstyle #fashion #fashionfashion #fashionblog #blog #blogger #designer #fashiondesigner #streetstyle #streetfashion #sikh #sikhfashion #singhstreetstyle #sikhdesigner #bearded #indian #indianfashionblog #indiandesigner #international #ootd #lookbook #delhistyleblog #delhifashionblog"
        },
        "type" : "image",
        "tags" : [
          "men",
          "delhifashionblog",
          "menswearofficial",
          "fashiondesigner",
          "singhstreetstyle",
          "fashionblog",
          "mensfashion",
          "fashion",
          "sikhfashion",
          "delhistyleblog",
          "sikhdesigner",
          "indianfashionblog",
          "lookbook",
          "fashionfashion",
          "designer",
          "streetfashion",
          "international",
          "paramsahib",
          "mensfashionblogger",
          "indian",
          "blog",
          "mensfashionblog",
          "menstyle",
          "ootd",
          "indiandesigner",
          "menswear",
          "blogger",
          "sikh",
          "streetstyle",
          "bearded"
        ],
        "filter" : "Normal",
        "attribution" : null,
        "location" : null,
        "id" : "1412879673545024927_1373991666",
        "likes" : {
          "count" : 163
        }
      }
    }
  },
Akhil Mordia
  • 132
  • 1
  • 7

1 Answers1

5

If your tags type in mapping is object (which is by default) you can use an aggregation query like this:

{
   "size": 0,
   "aggs": {
      "frequent_tags": {
         "terms": {"field": "post.tags"}
      }
   }
}
Mohammad Mazraeh
  • 1,044
  • 7
  • 12
  • "Expected [START_OBJECT] under [term], but got a [VALUE_STRING] in [frequent_tags]" Got this error. – Akhil Mordia Jun 12 '17 at 10:08
  • `{ "took": 54, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 99912, "max_score": 0, "hits": [] }, "aggregations": { "frequent_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } } }` – Akhil Mordia Jun 12 '17 at 10:15
  • **"terms": {"field": "post.tags"}** gives the answer I guess. – Akhil Mordia Jun 12 '17 at 10:25
  • It's a matter of your mapping. Can you provide your index mapping? Also There is another question here https://stackoverflow.com/questions/33741416/elasticsearch-terms-aggregation-by-strings-in-an-array – Mohammad Mazraeh Jun 12 '17 at 10:26
  • Aow! Yes I didn't noticed the hierarchy. – Mohammad Mazraeh Jun 12 '17 at 10:28
  • 1
    **"terms": {"field": "post.tags.keyword"}** this finally solved it. – Akhil Mordia Jun 12 '17 at 11:40
  • For my problem I also had to add `.keyword` at the end. Should this be part of the "correct" answer maybe? – TheBay0r Jul 17 '19 at 10:27