3

There is an index of Elasticsearch with only post_id andcreated_at.
I'd like to group_by with post_id as the key.
If use search () as follows, you can get the score for each post_id.

res = elastic.search(index='play_post',
                      body={
                          "size": 0,
                          "query": {
                              "range": {
                                  "created_at": {
                                      "gte": start_date,
                                      "lte": end_date
                                  }
                              }
                          },
                          "aggs": {
                              "group_by_post_id": {
                                  "terms": {
                                      "field": "post_id"
                                  }
                              }
                          }
                      },
                      request_timeout=300)

The result is as follows.

{u'hits': {u'hits': [], u'total': 2606639, u'max_score': 0.0}, u'_shards': {u'successful': 5, u'failed': 0, u'total': 5}, u'took': 318, u'aggregations': {u'group_by_post_id': {u'buckets': [{u'key': 29062, u'doc_count': 136}, {u'key': 2499828, u'doc_count': 122}, {u'key': 2422738, u'doc_count': 66}, {u'key': 174648, u'doc_count': 65}, {u'key': 1928122, u'doc_count': 65}, {u'key': 2012556, u'doc_count': 62}, {u'key': 377819, u'doc_count': 56}, {u'key': 2856270, u'doc_count': 55}, {u'key': 1417120, u'doc_count': 48}, {u'key': 238278, u'doc_count': 47}], u'sum_other_doc_count': 2605917, u'doc_count_error_upper_bound': 32}}, u'timed_out': False}

Now, because of the large number of data stored in Elasticsearch, I tried to use elasticsearch.helpers.scan, trying to get the data as below.

res = elasticsearch.helpers.scan(elastic,
                                 index='play_post',
                                 scroll='2m',
                                 query={
                                     "size": 0,
                                     "query": {
                                         "range": {
                                             "created_at": {
                                                 "gte": start_date,
                                                 "lte": end_date
                                             }
                                         },
                                     },
                                     "aggs": {
                                         "group_by_post_id": {
                                             "terms": {
                                                 "field": "post_id"
                                             }
                                         }
                                     }
                                 },
                                 request_timeout=300)

However, the result has not been able to acquire the score of how many post_id there are as follows.

{u'sort': [0], u'_type': u'play_post', u'_source': {u'post_id': 1281625, u'created_at': u'2018-04-14T19:29:11', u'user_id': 377765}, u'_score': None, u'_index': u'play_post', u'_id': u'd45d181c-0d2f-4bc9-aaa8-46fa5c41b748'}
{u'sort': [0], u'_type': u'play_post', u'_source': {u'post_id': 1632815, u'created_at': u'2018-04-15T13:09:56', u'user_id': 78467}, u'_score': None, u'_index': u'play_post', u'_id': u'cd279f13-42ee-4981-97c7-c18668a9b624'}
{u'sort': [0], u'_type': u'play_post', u'_source': {u'post_id': 1135965, u'created_at': u'2018-04-15T11:58:54', u'user_id': 318212}, u'_score': None, u'_index': u'play_post', u'_id': u'475f7199-4b20-4484-959a-873c38660180'}
.....
...
..
.

Please tell me how to do it.

xKxAxKx
  • 1,044
  • 5
  • 16
  • 31
  • it has been quite a while, but wondered if you have ever solved it? I am here for a similar problem... – Memin Mar 20 '22 at 21:48

0 Answers0