-1

I'll try to explain the problem as succinctly as possible. I'm trying to filter some values from a log file coming from Elastic. The log outputs this JSON exactly:

{'took': 2, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 2, 'relation': 'eq'}, 'max_score': None, 'hits': [{'_index': 'winlogbeat-dc-2022.10.17-000014', '_type': '_doc', '_id': 'vOCnfoQBeS2JF7giMG9q', '_score': None, '_source': {'agent': {'hostname': 'SRVDC1'}, '@timestamp': '2022-11-16T04:19:13.622Z'}, 'sort': [-9223372036854775808]}, {'_index': 'winlogbeat-dc-2022.10.17-000014', '_type': '_doc', '_id': 'veCnfoQBeS2JF7giMG9q', '_score': None, '_source': {'agent': {'hostname': 'SRVDC1'}, '@timestamp': '2022-11-16T04:19:13.630Z'}, 'sort': [-9223372036854775808]}]}}

Now, I want to filter out only the _index and @timestamp keys. If I assign this JSON to a variable, I can perfectly filter out the two keys by running:

index = (data['hits']['hits'][0]['_index'])
timestamp = (data['hits']['hits'][0]['_source']['@timestamp'])

Output:

winlogbeat-dc*
2022-11-16T04:19:13.622Z

However, if I try to do the same directly from the server call, I get:

Traceback (most recent call last):
  File "c:\Users\user\Desktop\PYTHON\tiny2.py", line 96, in <module>
    query()
  File "c:\Users\user\Desktop\PYTHON\tiny2.py", line 77, in query
    index = (final_data['hits']['hits'][0]['_index'])
TypeError: string indices must be integers

Now, I understand the it's asking for integer values instead of the strings I'm using, but if I use integers, then I get individual characters rather than a key/value pair.

What am I missing?

UPDATE: Below is the entire code, but it won't help much. It contains Elastic's DSL query language, and a call to the server, which obviously you won't be able to connect to. I tried your suggestions, but I either get the same error, or a new one:

    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not ObjectApiResponse

Entire code as follows:

import os
import ast
import csv
import json
from elasticsearch import Elasticsearch
from datetime import datetime,timedelta
import datetime

ELASTIC_USERNAME = 'elastic'
ELASTIC_PASSWORD = "abc123"
PORT= str('9200')
HOST = str('10.20.20.131')
CERT = os.path.join(os.path.dirname(__file__),"cert.crt")

initial_time = datetime.datetime.now()
past_time = datetime.datetime.now() - (timedelta(minutes=15))

def query():
    try: #connection to Elastic server
        es = Elasticsearch(
            "https://10.20.20.131:9200",
            ca_certs = CERT,
            verify_certs=False,
            basic_auth = (ELASTIC_USERNAME, ELASTIC_PASSWORD)
        )
    except ConnectionRefusedError as error:
        print("[-] Connection error")
    else: #DSL Elastic query of Domain Controler logs
        query_res = es.search(
            index="winlogbeat-dc*",
            body={
                "size": 3,
                "sort": [
                    {
                        "timestamp": {
                            "order": "desc",
                            "unmapped_type": "boolean"
                        }
                    }
                ],
                "_source": [
                    "agent.hostname",
                    "@timestamp"
                ],
                "query": {
                    "bool": {
                    "must": [],
                    "filter": [
                        {
                        "range": {
                            "@timestamp": {
                            "format": "strict_date_optional_time",
                            "gte": f'{initial_time}',
                            "lte": f'{past_time}'
                            }
                        }
                        }
                    ],
                    "should": [],
                    "must_not": []
                    }
                }
                }
            )
    
    if query_res:
        parse_to_json =json.loads(query_res)
        final_data = json.dumps(str(parse_to_json))
   
        index = ast.literal_eval(final_data)['hits']['hits'][0]['_index']
        timestamp = ast.literal_eval(final_data)['hits']['hits'][0]['_source']['@timestamp']

        columns = ['Index','Last Updated']
        rows = [[f'{index}',f'{timestamp}']]

        with open("final_data.csv", 'w') as csv_file:
            write_to_csv = csv.writer(csv_file)
            write_to_csv.writerow(columns)
            write_to_csv.writerows(rows)
            print("CSV file created!")

    else:
        print("Log not found")
query()
PythonNoob
  • 93
  • 9

4 Answers4

0

Elasticsearch returns an ObjectApiResponse so you have to parse the _source field:

import json
final_data = json.loads(query_res["_source"])
index = final_data['hits']['hits'][0]['_index']

I'm not sure why you surround with parenthesis the indexing selection.

Facorazza
  • 317
  • 1
  • 15
  • I tried that but nothing still. I'm at the end of my rope here... Update the code if you'd like to take a look. – PythonNoob Dec 14 '22 at 16:36
  • Nah, non funziona ancora. The information contained in the body variable is the DSL query language that reads the DC log info read by elastic. Ma che casino... I'm going to try to convert the json into a dict and see what happens... – PythonNoob Dec 14 '22 at 17:06
  • @PythonNoob Yes you should be able to obtain all the fields that the ObjectApiResponse has with query_res.__dict__ – Facorazza Dec 14 '22 at 17:48
0

If you're really getting ' in your response, use this:

import ast
...
index = ast.literal_eval(final_data)['hits']['hits'][0]['_index']

Otherwise use this:

import json
...
index = json.loads(final_data)['hits']['hits'][0]['_index']
Nineteendo
  • 882
  • 3
  • 18
  • I tried your suggestion, which is really good by the way, but the problem persists. Nothing seems to work. Now, if this helps, the data type being outputted by the server is elastic_transport.ObjectApiResponse – PythonNoob Dec 14 '22 at 16:34
  • I updated the question if you'd like to take a look... – PythonNoob Dec 14 '22 at 16:37
  • elastic_transport.ConnectionTimeout: Connection timed out – Nineteendo Dec 14 '22 at 16:41
  • You won't be able to connect to it. It's an internal server. I was hoping someone could spot an error somewhere else. The connection works fine, as does the DSL query, which is another monster on its own. – PythonNoob Dec 14 '22 at 16:45
0

I struggle to make sense of this:

    query_res = es.search(...)
    if query_res:
        parse_to_json =json.loads(query_res)
        final_data = json.dumps(str(parse_to_json))
   
        index = ast.literal_eval(final_data)['hits']['hits'][0]['_index']
        timestamp = ast.literal_eval(final_data)['hits']['hits'][0]['_source']['@timestamp']

query_res is an instance of ObjectApiResponse, and you can get data from it like a dictionary right away. Instead you perform a sequence of converting object to string and back again, and then "stringify" it once more, with unpredictable results.

Just do it like they do in ES docs:

first_hit = query_res['hits']['hits'][0]
index = first_hit['_index']
timestamp = first_hit['_source']['@timestamp']
Nikolaj Š.
  • 1,457
  • 1
  • 10
  • 17
0

I was able to fix the problem by running a broad query first with

query_res['hits']['hits']

And then run a for loop with the specific time range I needed.

Here is the code:

for query_res in query_res['hits']['hits']:
        winlogbeat_dc_timestamp = query_res['_source']['@timestamp']

Then another issue arose. I needed to convert datetime format into a string:

    #Convert datetime to string
    pattern = '%Y-%m-%dT%H:%M:%S.%fZ'
    dt = datetime.strptime(winlogbeat_dc_timestamp, pattern)
    new_timestamp = str(dt + timedelta(hours=1))[:-3] + 'Z'

And finally format it to a more readable pattern:

    #Format timestamp to a more readable pattern
    formatted_time = (''.join(letter for letter in new_timestamp if not letter.isalpha()))[:19]
    formatted_time2 = formatted_time[:10] + ' / ' + formatted_time[10:]
    
Robin Sage
  • 969
  • 1
  • 8
  • 24