1

I am pretty new with elasticsearch. so, please forgive if i am asking a very simple question.

In my workplace we have a proper setup of ELK.

Due to the very large volume of data we are just storing 14 days of data and my question is how can i read the data in Python and later store my analysis in some NOSQL.

As of now my primary goal is to read the raw data into python in the form of data frame or any format from the elastic cluster.

I want to get it for different time intervals like 1 day, 1 week, 1 month etc..

I am struggling for the last 1 week.

  • Possible duplicate of [ElasticSearch query to pandas dataframe](https://stackoverflow.com/questions/46471922/elasticsearch-query-to-pandas-dataframe) – Phil B May 11 '19 at 21:14

2 Answers2

1

you can use the below code to achieve that

# Create a DataFrame object
from pandasticsearch import DataFrame
df = DataFrame.from_es(url='http://localhost:9200', index='indexname')

To get the schema of your index:-

 df.print_schema()

After that you can perform general dataframe operation on the df.

If you want to parse the result then do the below :-

from elasticsearch import Elasticsearch
es = Elasticsearch('http://localhost:9200')
result_dict = es.search(index="indexname", body={"query": {"match_all": {}}})

and then finally everything into your final dataframe:-

from pandasticsearch import Select
pandas_df = Select.from_dict(result_dict).to_pandas()

I hope it helps..

ak3191
  • 583
  • 5
  • 14
  • 1
    Search over ES will fail when the size of the documents is >10K. You need to use Scroll API. – Amogh Mishra Aug 21 '18 at 18:13
  • Thanks for your input...If you can share any better way of doing this.. I am also learning.. If you know any better way to read data directly from server and dump into pandas dataframe. – ak3191 Aug 21 '18 at 18:15
0

It depends on how you want to read the data from the Elasticsearch. Is it incremental reading i.e. reading new data that comes to you every day or is it like a bulk reading. For the latter, you need to use the bulk API of Elasticsearch in python and for the former, you can restrict yourself to a simple range query.

Schematic code for reading bulk data: https://gist.github.com/dpkshrma/04be6092eda6ae108bfc1ed820621130

How to use bulk API of ES:

How to use Bulk API to store the keywords in ES by using Python

https://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.bulk

How to use the range query for incremental inserts:

https://martinapugliese.github.io/python-for-(some)-elasticsearch-queries/

How to have Range and Match query in one elastic search query using python?

Since you want your data to be inserted in different intervals, you will require to perform date aggregations as well.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html

How to perform multiple aggregation on an object in Elasticsearch using Python?

Once you issue your Elasticsearch query, your data will be collected in a temporary variable, you can use the python library over NOSQL database such as PyMongo to insert into Elasticsearch data into it.

Community
  • 1
  • 1
Amogh Mishra
  • 1,088
  • 1
  • 16
  • 25
  • can you help with something in which we can scrap all the data into elastic cluster into python data frame... not from csv but real time serve data. If you have any experience with that. –  Aug 21 '18 at 18:12