I'm new to elasticsearch domain. I'm learning and trying it out to check if it meets my needs.
Right now I'm learning aggregations in elasticsearch and I wrote the following python script to ingest some time-series data into elasticsearch.
Every 5 seconds I create a new message which will have:
- Timestamp (ISO8601 format)
- Counter
- A random number between 0 and 100
For every new day, I create a new index with logs_Y-m-D
as the index name.
I will index every message using the message Counter
as the _id
. The counter resets for every new index (every day).
import csv
import time
import random
from datetime import datetime
from elasticsearch import Elasticsearch
class ElasticSearchDB:
def __init__(self):
self.es = Elasticsearch()
def run(self):
print("Started: {}".format(datetime.now().isoformat()))
print("<Ctrl + c> for exit!")
with open("..\\out\\logs.csv", "w", newline='') as f:
writer = csv.writer(f)
counter = 0
try:
while True:
i_name = "logs_" + time.strftime("%Y-%m-%d")
if not self.es.indices.exists([i_name]):
self.es.indices.create(i_name, ignore=400)
print("New index created: {}".format(i_name))
counter = 0
message = {"counter": counter, "@timestamp": datetime.now().isoformat(), "value": random.randint(0, 100)}
# Write to file
writer.writerow(message.values())
# Write to elasticsearch index
self.es.index(index=i_name, doc_type="logs", id=counter, body=message)
# Waste some time
time.sleep(5)
counter += 1
except KeyboardInterrupt:
print("Stopped: {}".format(datetime.now().isoformat()))
test_es = ElasticSearchDB()
test_es.run()
I ran this script for 30 minutes. Next, using Sense, I query elasticsearch with following aggregation queries.
Query #1: Get all
Query #2: Aggregate logs from last 1 hour and generate stats for them. This shows right results.
Query #3: Aggregate logs from last 1 minute and generate stats for them. The number of docs aggregated is same as in from 1-hour aggregations, ideally, it should have aggregated only 12-13 logs.
Query #4: Aggregate logs from last 15 seconds and generate stats for them. The number of docs aggregated is same as in from 1-hour aggregations, ideally, it should have aggregated only 3-4 logs.
My Questions:
- Why is elasticsearch not able to understand 1 minute and 15 seconds range?
- I understand mappings but I don't know how to write one, so I've not written one, is that what is causing this problem?
Please help!
Query #1: Get all
GET /_search
Output:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 314,
"max_score": 1,
"hits": [
{
"_index": "logs_2016-11-03",
"_type": "logs",
"_id": "19",
"_score": 1,
"_source": {
"counter": 19,
"value": 62,
"@timestamp": "2016-11-03T07:40:35.981395"
}
},
{
"_index": "logs_2016-11-03",
"_type": "logs",
"_id": "22",
"_score": 1,
"_source": {
"counter": 22,
"value": 95,
"@timestamp": "2016-11-03T07:40:51.066395"
}
},
{
"_index": "logs_2016-11-03",
"_type": "logs",
"_id": "25",
"_score": 1,
"_source": {
"counter": 25,
"value": 18,
"@timestamp": "2016-11-03T07:41:06.140395"
}
},
{
"_index": "logs_2016-11-03",
"_type": "logs",
"_id": "26",
"_score": 1,
"_source": {
"counter": 26,
"value": 58,
"@timestamp": "2016-11-03T07:41:11.164395"
}
},
{
"_index": "logs_2016-11-03",
"_type": "logs",
"_id": "29",
"_score": 1,
"_source": {
"counter": 29,
"value": 73,
"@timestamp": "2016-11-03T07:41:26.214395"
}
},
{
"_index": "logs_2016-11-03",
"_type": "logs",
"_id": "41",
"_score": 1,
"_source": {
"counter": 41,
"value": 59,
"@timestamp": "2016-11-03T07:42:26.517395"
}
},
{
"_index": "logs_2016-11-03",
"_type": "logs",
"_id": "14",
"_score": 1,
"_source": {
"counter": 14,
"value": 9,
"@timestamp": "2016-11-03T07:40:10.857395"
}
},
{
"_index": "logs_2016-11-03",
"_type": "logs",
"_id": "40",
"_score": 1,
"_source": {
"counter": 40,
"value": 9,
"@timestamp": "2016-11-03T07:42:21.498395"
}
},
{
"_index": "logs_2016-11-03",
"_type": "logs",
"_id": "24",
"_score": 1,
"_source": {
"counter": 24,
"value": 41,
"@timestamp": "2016-11-03T07:41:01.115395"
}
},
{
"_index": "logs_2016-11-03",
"_type": "logs",
"_id": "0",
"_score": 1,
"_source": {
"counter": 0,
"value": 79,
"@timestamp": "2016-11-03T07:39:00.302395"
}
}
]
}
}
Query #2: Get stats from last 1 hour.
GET /logs_2016-11-03/logs/_search?search_type=count
{
"aggs": {
"time_range": {
"filter": {
"range": {
"@timestamp": {
"from": "now-1h"
}
}
},
"aggs": {
"just_stats": {
"stats": {
"field": "value"
}
}
}
}
}
}
Output:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 366,
"max_score": 0,
"hits": []
},
"aggregations": {
"time_range": {
"doc_count": 366,
"just_stats": {
"count": 366,
"min": 0,
"max": 100,
"avg": 53.17213114754098,
"sum": 19461
}
}
}
}
I get 366 entries, which is correct.
Query #3: Get stats from last 1 minute.
GET /logs_2016-11-03/logs/_search?search_type=count
{
"aggs": {
"time_range": {
"filter": {
"range": {
"@timestamp": {
"from": "now-1m"
}
}
},
"aggs": {
"just_stats": {
"stats": {
"field": "value"
}
}
}
}
}
}
Output:
{
"took": 15,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 407,
"max_score": 0,
"hits": []
},
"aggregations": {
"time_range": {
"doc_count": 407,
"just_stats": {
"count": 407,
"min": 0,
"max": 100,
"avg": 53.152334152334156,
"sum": 21633
}
}
}
}
This is wrong, it can't be 407 entries in last 1 minute, it should have been 12-13 logs only.
Query #4: Get stats from last 15 seconds.
GET /logs_2016-11-03/logs/_search?search_type=count
{
"aggs": {
"time_range": {
"filter": {
"range": {
"@timestamp": {
"from": "now-15s"
}
}
},
"aggs": {
"just_stats": {
"stats": {
"field": "value"
}
}
}
}
}
}
Output:
{
"took": 15,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 407,
"max_score": 0,
"hits": []
},
"aggregations": {
"time_range": {
"doc_count": 407,
"just_stats": {
"count": 407,
"min": 0,
"max": 100,
"avg": 53.152334152334156,
"sum": 21633
}
}
}
}
This is also wrong, it can't be 407 entries in last 15 seconds. It should have been only 3-4 logs only.