How can I read data from a list and index specific values into Elasticsearch, using python?

Question

I have used "paramiko" to connect from my PC to a devboard, and execute a script. Then I am saving the results of this script in a list (output). I want to extract some values of the list and insert them into Elasticsearch. I have done it manually with the first result of the list. But how can I automate for the rest of the values? Do I need "regex"? Please give me some clues.

Thank you

THIS IS PART OF THE CODE THAT CONNECTS TO THE DEVBOARD, EXECUTES A SCRIPT AND RETRIEVES A LIST=output

def main():
    ssh = initialize_ssh()
    stdin, stdout, stderr = ssh.exec_command('cd coral/tflite/python/examples/classification/Auto_benchmark\n python3 auto_benchmark.py')
    output = stdout.readlines()
    type(output)
    #print(type(output))
    print('\n'.join(output))
    ssh.close()

THE LIST LOOKS LIKE THIS:

labels: imagenet_labels.txt 

Model: efficientnet-edgetpu-S_quant_edgetpu.tflite 

Image: img0000.jpg 


----INFERENCE TIME----

Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.

Time: 6.2ms

Results: wall clock

Score: 0.25781

##################################### 

labels: imagenet_labels.txt 

Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite 

Image: img0000.jpg 


----INFERENCE TIME----

Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.

Time: 2.8ms

Results: umbrella

Score: 0.22266

##################################### 
Temperature: 35C

THIS IS THE MAPPING THAT IS NEEDED TO INDEX DATA INTO ELASTICSEARCH

def initialize_mapping_classification(es):
    """
    Initialise les mappings
    """
    mapping_classification = {
        'properties': {
            '@timestamp': {'type': 'date'},
            'type': 'coralito',
            'Model': {'type': 'string'},
            'Time': {'type': 'float'},
            'Results': {'type': 'string'},
            'Score': {'type': 'float'},
            'Temperature': {'type': 'float'}
        }
    }

    if not es.indices.exists(CORAL):
        es.indices.create(CORAL)
        es.indices.put_mapping(body=mapping_classification, doc_type=DOC_TYPE, index=CORAL)

THIS IS MY ATTEMPT. I HAVE DONE IT MANUALLY WITH THE FIRST RESULT OF THE LIST. I WANT TO AUTOMATE IT

if CLASSIFY == 1:
                
        doc = {
            '@timestamp':  str(datetime.datetime.utcnow().strftime("%Y-%m-%d"'T'"%H:%M:%S")),
            'type': 'coralito',
            'Model': "efficientnet-edgetpu-S_quant_edgetpu.tflite",
            'Time': "6.2 ms",
            'Results': "wall clock",
            'Score': "0.25781",
            'Temperature': "35 C"
        }

        response = send_data_elasticsearch(CORAL, DOC_TYPE, doc, es)

        print(doc)

------------------------------EDIT 2---------------------------------------

So this is how my data looks like after using regex to extract the values of interest

This is what I get indexed:

This is my code:

import elasticsearch  
from elasticsearch import Elasticsearch, helpers
import datetime
import re

data = ['labels: imagenet_labels.txt \n', '\n', 'Model: efficientnet-edgetpu-S_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 23.1\n', 'Time(ms): 5.7\n', '\n', '\n', 'Inference: corkscrew, bottle screw\n', 'Score: 0.03125 \n', '\n', 'TPU_temp(°C): 57.05\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: efficientnet-edgetpu-M_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 29.3\n', 'Time(ms): 10.8\n', '\n', '\n', "Inference: dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk\n", 'Score: 0.09375 \n', '\n', 'TPU_temp(°C): 56.8\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: efficientnet-edgetpu-L_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 45.6\n', 'Time(ms): 31.0\n', '\n', '\n', 'Inference: pick, plectrum, plectron\n', 'Score: 0.09766 \n', '\n', 'TPU_temp(°C): 57.55\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v3_299_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 68.8\n', 'Time(ms): 51.3\n', '\n', '\n', 'Inference: ringlet, ringlet butterfly\n', 'Score: 0.48047 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v4_299_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 121.8\n', 'Time(ms): 101.2\n', '\n', '\n', 'Inference: admiral\n', 'Score: 0.59375 \n', '\n', 'TPU_temp(°C): 57.05\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v2_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 34.3\n', 'Time(ms): 16.6\n', '\n', '\n', 'Inference: lycaenid, lycaenid butterfly\n', 'Score: 0.41406 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: mobilenet_v2_1.0_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 14.4\n', 'Time(ms): 3.3\n', '\n', '\n', 'Inference: leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea\n', 'Score: 0.36328 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 14.5\n', 'Time(ms): 3.0\n', '\n', '\n', 'Inference: bow tie, bow-tie, bowtie\n', 'Score: 0.33984 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n', 'labels: imagenet_labels.txt \n', '\n', 'Model: inception_v1_224_quant_edgetpu.tflite \n', '\n', 'Image: insect.jpg \n', '\n', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\n', 'Time(ms): 21.2\n', 'Time(ms): 3.6\n', '\n', '\n', 'Inference: pick, plectrum, plectron\n', 'Score: 0.17578 \n', '\n', 'TPU_temp(°C): 57.3\n', '##################################### \n', '\n']


# declare a client instance of the Python Elasticsearch library
client = Elasticsearch("http://localhost:9200")

#using regex 
regex = re.compile(r'(\w+)\((.+)\):\s(.*)|(\w+:)\s(.*)')
match_regex = list(filter(regex.match, data))
match = [line.rstrip('\n') for line in match_regex]


#using "bulk"
def yield_docs():
    """
    Initialise les mappings
    """
    
    doc_source = {
        "data": match
        
        }

    # use a yield generator so that the doc data isn't loaded into memory
    yield {
        "_index": "coralito",
        "_type": "coralote",
        "_source": doc_source
        }

try:
    # make the bulk call using 'actions' and get a response
    resp = helpers.bulk(
        client,
        yield_docs()
    )
    print ("\nhelpers.bulk() RESPONSE:", resp)
    print ("RESPONSE TYPE:", type(resp))
except Exception as err:
    print("\nhelpers.bulk() ERROR:", err)

-----------------------------EDIT 3---------------------

score 1 · Accepted Answer · answered Jun 18 '20 at 22:40

1

Remove the line breaks
Split the text by a common delimiter (----INFERENCE TIME---- would be a good start I think)
Extract the keys & values using for example r'(\w+:)\s(.*)' or a named lookbehind such as r'(?<=Note: ).*' etc
Parse the numeric values (time, score, temperature, ...) -- you'll thank me later ;)
Extend the Model mapping w/ a keyword datatype -- otherwise the dot will be tokenized away and you'll wonder why you can't search for exact matches nor aggregate on it
Prepare the objects that you'll want to sync
Bulk upload to ElasticSearch

answered Jun 18 '20 at 22:40

Joe - GMapsBook.com

15,787
4
23
68

Thank you. I have some questions: Why do I have to split the text by a common delimiter (2.) if I have to extract keys and values? Also, can you explain me 4.? – Aizzaac Jun 19 '20 at 15:23
1

2) you need separate docs and your .txt file is free flow text so a delimiter is needed... 4) you may be interested in range queries on numeric fields such as `Temperature` and `Time` -- even your mapping says so -- but your example doc includes strings. So you wanna parse `6.2` out of `6.2 ms`. – Joe - GMapsBook.com Jun 19 '20 at 18:15
Ok. I am in number 6. What do you mean? – Aizzaac Jun 20 '20 at 22:16
Also, I have extracted the values using regex and put them in a dictionary and then sent it to elasticsearch. But, it is ONLY 1 values of the list. So I guess, this is where numebr 7. enters. I need some help there. I will put the code – Aizzaac Jun 20 '20 at 22:18
What I do not understand is .. do I need json to use that "Bulk"? – Aizzaac Jun 20 '20 at 22:42
yes you need an array of objets to pass to the bulk actions. – Joe - GMapsBook.com Jun 21 '20 at 10:16
I need more advice. I am putting my code and results. Please check! – Aizzaac Jun 23 '20 at 21:11
You're getting close. Now split the list values by `: ` so you have key-value pairs and then clean up the values -- remove whitespace and parse the numerics. – Joe - GMapsBook.com Jun 24 '20 at 08:57
What do you think of EDIT 3? In one of the images the images there is a yellow sign which says: "objects in arrays are not well supported". What does it mean? – Aizzaac Jun 25 '20 at 15:34
1

You can disregard that warning -- objects in arrays are standard practice. Take a look at `nested` fields too though -- https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html – Joe - GMapsBook.com Jun 26 '20 at 08:39
Do you know if elastic is compatible with ARM 64 platforms? Currently I am sending the data from the devboard to elastic using a PC (it does the indexing). But it will be interesting if the devboard could index the results directly to an Elasticsearch server. – Aizzaac Jun 30 '20 at 21:48
No idea what ARM 64 is, sorry. You need _some_ sort of a server to run ES... – Joe - GMapsBook.com Jul 01 '20 at 11:45
I am getting an error when adding the timestamp: https://stackoverflow.com/questions/62778983/compressor-detection-can-only-be-called-on-some-xcontent-bytes-or-compressed-xc – Aizzaac Jul 07 '20 at 15:44

How can I read data from a list and index specific values into Elasticsearch, using python?

1 Answers1

Linked