0

We have an ElasticSearch instance on Linux in the Azure cloud. We are trying to programmatically obtain a flat file or dump (the format is negotiable) of one of our ElasticSearch indexes once every 24 hours at a specified time, which would then be delivered to a customer, who does not have ElasticSearch. The file would be about 15GB in size, and include approximately 7 million documents.

We are thinking we need to start with a query on our ElasticSearch instance which would actually get the data, however, through my perusal of the documentation, I don’t see such a query to accomplish this.

Is anyone aware of such a query, or methodology to achieve this? In addition to the query, the large size of the file is of concern, and would need to be considered for the correct solution to be achieved.

EDIT: I've added some additional relevant information that was not obvious in the first post that may make the answers differ slightly.

Stpete111
  • 3,109
  • 4
  • 34
  • 74
  • 1
    This answer may help: https://stackoverflow.com/a/34922623/4604579 (elasticdump). There is also another tool called es2csv which you can see here: https://stackoverflow.com/a/51982535/4604579 – Val Jan 08 '19 at 16:57

1 Answers1

1

One of the possibility apart from what Val mentioned is to use snapshot functionality.

A snapshot is a backup taken from a running Elasticsearch cluster. You can take a snapshot of individual indices or of the entire cluster and store it in a repository on a shared filesystem, and there are plugins that support remote repositories on S3, HDFS, Azure, Google Cloud Storage and others.

Later, this snapshot could be restored on same cluster or on fresh cluster (if you're intended to use it as a backup or fail over mechanism)

Mysterion
  • 9,050
  • 3
  • 30
  • 52
  • Hi Mysterion, thanks for this suggestion. Would we be able to convert this snapshot to a flat file or the like? This file is to be delivered to a customer on a daily basis, and they don't have any ElasticSearch instance, so they just need the data in a file that they can work with. I have actually edited my post to include this piece of information since it's relevant. – Stpete111 Jan 08 '19 at 18:50