4

I have Elasticsearch 1.2.2 installed on a Debian server, with ~5.3M indexed documents. When I run myindex/_stats, I get the following info :

{
   "_shards": {
      "total": 10,
      "successful": 5,
      "failed": 0
   },
   "_all": {
      "primaries": {
         "docs": {
            "count": 5306837,
            "deleted": 100209
         },
         "store": {
            "size_in_bytes": 32003706527,
            "throttle_time_in_millis": 1657592
         },
  ....
}

which tells me the total size of my documents is equal to ~ 32 GB

However, the size of the data folder in the elasticsearch folder is 72GB

From the Elasticsearch doc, I've tried running

curl -XPOST 'http://localhost:9200/myindex/_optimize?only_expunge_deletes=true'

Running this command has

  • reduced the number of deleted docs from 300k to 100k (as returned by the _stats command above) but not to 0 as I would have expected
  • reduced the disk usage from 90G to 72G, but not to 32G which is actual size of my documents

(note : I also ran this command on all indexes = curl -XPOST 'http://localhost:9200/_optimize?only_expunge_deletes=true, with no significant difference)

How do I reduce the data folder size to the actual size of my documents ?

benoit
  • 891
  • 12
  • 22
  • For es from 2.1, you can refer from https://stackoverflow.com/questions/20608417/elasticsearch-how-to-free-store-size-after-deleting-documents – Duc Chi Oct 18 '18 at 07:20

3 Answers3

3

By default, elasticsearch only merges away a segment if its delete percentage is over 10 %. If you want to delete all documents marked as deleted in the index, you should change index.merge.policy.expunge_deletes_allowed in elasticsearch.yml and set it to 0, then run the optimize command:

curl -XPOST 'http://localhost:9200/myindex/_optimize?only_expunge_deletes=true'

You can have a look at this link for more details about merge policy.

Zied Koubaa
  • 213
  • 1
  • 18
2

I think the difference you see in size is related to indexing and document metadata which is normal for any database. Size of indexes depends on your mappings. So technically, your documents size will never be the same as size of elasticsearch data folder.

Following links might help explain this better:

Using too much disk space

Elastic blog about storage requirements

Community
  • 1
  • 1
CrnaStena
  • 3,017
  • 5
  • 30
  • 48
0

You should run the following:

curl -XPOST 'http://localhost:9200/myindex/_optimize?max_num_segments=1

Perhaps you should run it more than once. (Because if there is too much segments it will not join all of them in one step.)

Kalman
  • 150
  • 10
  • Thanks for your reply. I tried it, it took 45 min to run but unfortunately, it didn't reduce the disk usage – benoit Dec 04 '14 at 12:22