Elasticsearch query to return all records

Question

I have a small database in Elasticsearch and for testing purposes would like to pull all records back. I am attempting to use a URL of the form...

http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}

Can someone give me the URL you would use to accomplish this, please?

...where "foo" is the name of the index you want to show all records for. — jonatan, Jan 27 '19 at 11:43
All the answers using only `size` query parameter are not correct. Irrespective of value of `size` in query, ES will return at max `index.max_result_window` docs (which default to 10k) in response. Refer `scroll` and `search_after`. — narendra-choudhary, May 28 '21 at 19:52
I was able to return all records with only this line: `http://curl -XGET 'localhost:9200/foo/_search'` — Mr. N, Jul 22 '23 at 20:32

Steve Casey · Accepted Answer · 2016-05-25T04:10:09.867

902

I think lucene syntax is supported so:

http://localhost:9200/foo/_search?pretty=true&q=*:*

size defaults to 10, so you may also need &size=BIGNUMBER to get more than 10 items. (where BIGNUMBER equals a number you believe is bigger than your dataset)

BUT, elasticsearch documentation suggests for large result sets, using the scan search type.

EG:

curl -XGET 'localhost:9200/foo/_search?search_type=scan&scroll=10m&size=50' -d '
{
    "query" : {
        "match_all" : {}
    }
}'

and then keep requesting as per the documentation link above suggests.

EDIT: scan Deprecated in 2.1.0.

scan does not provide any benefits over a regular scroll request sorted by _doc. link to elastic docs (spotted by @christophe-roussy)

edited May 25 '16 at 04:10

answered Jan 12 '12 at 07:28

Steve Casey

9,966
1
20
24

6

Thanks. This was the final I came up with that returns what I need for now...http://localhost:9200/foo/_search?size=50&pretty=true&q=*:* – John Livermore Jan 12 '12 at 09:41
2

Adding to @Steve's answer, you can find a list of parameters that elasticsearch understands in this link http://www.elasticsearch.org/guide/reference/api/search/uri-request/ – Karthick Jul 31 '13 at 08:36
Is it possible to run a scan search with a query other than a match_all query? – Churro Sep 10 '13 at 20:30
@Churro u should post a question, not hide it in the comments. but short answer, yes. http://www.elasticsearch.org/guide/reference/api/search/query/ – Steve Casey Sep 11 '13 at 02:09
1

Thanks @Steve for your answer. I didn't think it was significant enough for a new question. It wasn't explicitly stated anywhere, so I figured I'd ask here just to verify. – Churro Sep 11 '13 at 15:32
9

You should really use the scan+scroll-requests. If you do use size=BIGNUMBER, note that Lucene allocates memory for scores for that number, so don't make it exceedingly large. :) – Alex Brasetvik Nov 18 '13 at 19:33
Did you really mean to use `-d` with `-XGET`? – rakslice Jan 13 '15 at 19:42
I was unaware of the `?size=` query string parameter until your answer, @SteveCasey. Thank you **so** much for posting this. My use case just requires me to list _all_ the documents in a small index (generally <200 items), so appending `?size=1000` to the query made it fire right up. – Pierce Aug 15 '15 at 22:38
hey @SteveCasey I am struggling to find this answer. Could you please help me - http://stackoverflow.com/questions/34481152/sort-by-geo-distance-where-latitude-and-longitude-given-is-0-00-in-elasticsearch?noredirect=1#comment56708837_34481152 – Chopra Dec 28 '15 at 04:19
4

Scan was deprecated in deprecated in 2.1.0: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#scan – Christophe Roussy May 03 '16 at 10:41
@SteveCasey Ideally ES should respond with something special: http://stackoverflow.com/questions/13884141/convention-for-http-response-header-to-notify-clients-of-deprecated-api, another interesting problem ... – Christophe Roussy May 27 '16 at 11:34
Seeing as scan is deprecated, should this answer be updated to use scroll? – Will Barnwell Feb 24 '17 at 16:35
Yes. Well a scan is a type of scroll. The answer should not include the 'search_type=scan' parameter. You don't need it, and it is deprecated. – Harry Wood Oct 25 '18 at 14:12
Actually I've just noticed "search_type:scan" is not only deprecated. It was removed in elasticsearch version 5.0: https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking_50_search_changes.html#_literal_search_type_scan_literal_removed – Harry Wood Oct 25 '18 at 23:26
as scan was deprecated, i just changed it to search_type=query_then_fetch . That helped me a lot , thanks! – Sauer Jul 28 '23 at 06:50

random-forest-cat · Answer 2 · 2016-07-19T14:33:10.910

183

http://127.0.0.1:9200/foo/_search/?size=1000&pretty=1
                                   ^

Note the size param, which increases the hits displayed from the default (10) to 1000 per shard.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html

edited Jul 19 '16 at 14:33

answered Apr 07 '14 at 03:14

random-forest-cat

33,652
11
120
99

12

One thing to keep in mind though (from Elasticsearch docs): Note that from + size can not be more than the index.max_result_window index setting which defaults to 10,000. – user3078523 Feb 14 '18 at 11:47
3

This will return 1000, not all, user3078523 is right, this method has a limit of `max_result_window` – stelios Aug 08 '18 at 10:10
1

It has a maximum, and also (if you have many thousands of records to get) it's a rather noddy heavy approach to be going up towards that maximum. Instead you should use a "scroll" query. – Harry Wood Oct 25 '18 at 14:14
you should pass `pretty` param as boolean: `curl -XGET 'localhost:9200/logs/_search/?size=1000&pretty=true'` – Yar Aug 20 '20 at 05:06
this is the answer I'm looking for. the one without passing the request parameter `q`. thank you! – asgs Aug 25 '21 at 15:26

score 48 · Answer 3 · answered Sep 28 '15 at 21:31

48

elasticsearch(ES) supports both a GET or a POST request for getting the data from the ES cluster index.

When we do a GET:

http://localhost:9200/[your index name]/_search?size=[no of records you want]&q=*:*

When we do a POST:

http://localhost:9200/[your_index_name]/_search
{
  "size": [your value] //default 10
  "from": [your start index] //default 0
  "query":
   {
    "match_all": {}
   }
}

I would suggest to use a UI plugin with elasticsearch http://mobz.github.io/elasticsearch-head/ This will help you get a better feeling of the indices you create and also test your indices.

answered Sep 28 '15 at 21:31

Prerak Diwan

509
4
8

5

As another user mentioned: `from` + `size` can not be more than the `index.max_result_window` index setting which defaults to 10,000 – stelios Aug 08 '18 at 10:11
This approach has a maximum, and also (if you have many thousands of records to get) it's a rather noddy heavy approach to be going up towards that maximum. Instead you should use a "scroll" query – Harry Wood Oct 25 '18 at 14:15
Oddly enough, the official docs show `curl -XGET ... -d '{...}'` which is an `un`official mixed style of request. Thank you for showing the correct GET and POST formats. – Jesse Chisholm Feb 28 '20 at 21:19

vjpandian · Answer 4 · 2019-03-19T13:47:09.980

Note: The answer relates to an older version of Elasticsearch 0.90. Versions released since then have an updated syntax. Please refer to other answers that may provide a more accurate answer to the latest answer that you are looking for.

The query below would return the NO_OF_RESULTS you would like to be returned..

curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
    "match_all" : {}
  }
}'

Now, the question here is that you want all the records to be returned. So naturally, before writing a query, you wont know the value of NO_OF_RESULTS.

How do we know how many records exist in your document? Simply type the query below

curl -XGET 'localhost:9200/foo/_search' -d '

This would give you a result that looks like the one below

 {
hits" : {
  "total" :       2357,
  "hits" : [
    {
      ..................

The result total tells you how many records are available in your document. So, that's a nice way to know the value of NO_OF RESULTS

curl -XGET 'localhost:9200/_search' -d '

Search all types in all indices

curl -XGET 'localhost:9200/foo/_search' -d '

Search all types in the foo index

curl -XGET 'localhost:9200/foo1,foo2/_search' -d '

Search all types in the foo1 and foo2 indices

curl -XGET 'localhost:9200/f*/_search

Search all types in any indices beginning with f

curl -XGET 'localhost:9200/_all/type1,type2/_search' -d '

Search types user and tweet in all indices

By default ES will return 10 results unless a size param is included in the base query. — random-forest-cat, Apr 07 '14 at 03:11
The previous response was three years old. Updated it to a current one. — vjpandian, Jul 11 '15 at 18:19

score 26 · Answer 5 · edited Oct 31 '16 at 10:57

This is the best solution I found using python client

  # Initialize the scroll
  page = es.search(
  index = 'yourIndex',
  doc_type = 'yourType',
  scroll = '2m',
  search_type = 'scan',
  size = 1000,
  body = {
    # Your query's body
    })
  sid = page['_scroll_id']
  scroll_size = page['hits']['total']

  # Start scrolling
  while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    print "scroll size: " + str(scroll_size)
    # Do something with the obtained page

https://gist.github.com/drorata/146ce50807d16fd4a6aa

Using java client

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
do {
    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

Thanks Mark, that was exactly what I was looking for! In my case (ELK 6.2.1, python 3), the search_type argument was not valid and the document_type isn't needed any more since ELK 6.0 — Christoph Schranz, Mar 12 '18 at 13:40
Perfect solution! Thanks. I was using `elasticsearch_dsl==5.4.0` and it works without `search_type = 'scan',`. — Usman Maqbool, May 10 '18 at 15:55
ES 6.3. This example makes my Elasticsearch service to crash, trying to scroll 110k documents with `size=10000`, at somewhere between 5th-7th iterations. with `status=127`, `main ERROR Null object returned for RollingFile in Appenders`, `main ERROR Unable to locate appender "rolling" for logger config "root"` No logs in `/var/log/elasticsearch/elasticsearch.log` — stelios, Aug 08 '18 at 18:03
For the record, the python clients implements a `scan` helpers` that does the scroll under the hood (since version 5.x.x at leat) — MCMZL, Aug 15 '18 at 08:05
`search_type = 'scan'` is deprecated. Similar code will work without that, although there are some interesting differences which are well buried in the old documentation. https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-request-scroll.html#scroll-scan In particular, when migrating to not use search_type=scan, that first 'search' query will come with the first batch of results to process. — Harry Wood, Oct 25 '18 at 14:20

score 23 · Answer 6 · answered Mar 06 '20 at 11:01

If it's a small dataset (e.g. 1K records), you can simply specify size:

curl localhost:9200/foo_index/_search?size=1000

The match all query isn't needed, as it's implicit.

If you have a medium-sized dataset, like 1M records, you may not have enough memory to load it, so you need a scroll.

A scroll is like a cursor in a DB. In Elasticsearch, it remembers where you left off and keeps the same view of the index (i.e. prevents the searcher from going away with a refresh, prevents segments from merging).

API-wise, you have to add a scroll parameter to the first request:

curl 'localhost:9200/foo_index/_search?size=100&scroll=1m&pretty'

You get back the first page and a scroll ID:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAADEWbmJlSmxjb2hSU0tMZk12aEx2c0EzUQ==",
  "took" : 0,
...

Remember that both the scroll ID you get back and the timeout are valid for the next page. A common mistake here is to specify a very large timeout (value of scroll), that would cover for processing the whole dataset (e.g. 1M records) instead of one page (e.g. 100 records).

To get the next page, fill in the last scroll ID and a timeout that should last until fetching the following page:

curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/_search/scroll' -d '{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAADAWbmJlSmxjb2hSU0tMZk12aEx2c0EzUQ=="
}'

If you have a lot to export (e.g. 1B documents), you'll want to parallelise. This can be done via sliced scroll. Say you want to export on 10 threads. The first thread would issue a request like this:

curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/test/_search?scroll=1m&size=100' -d '{
  "slice": {
    "id": 0, 
    "max": 10 
  }
}'

You get back the first page and a scroll ID, exactly like a normal scroll request. You'd consume it exactly like a regular scroll, except that you get 1/10th of the data.

Other threads would do the same, except that id would be 1, 2, 3...

Thanks, this is what I needed to understand (size); it helped me troubleshoot my empty (`[ ]`) returns. — Kalnode, Jan 31 '21 at 15:55

score 20 · Answer 7 · edited Oct 12 '21 at 16:03

If you want to pull many thousands of records then... a few people gave the right answer of using 'scroll' (Note: Some people also suggested using "search_type=scan". This was deprecated, and in v5.0 removed. You don't need it)

Start with a 'search' query, but specifying a 'scroll' parameter (here I'm using a 1 minute timeout):

curl -XGET 'http://ip1:9200/myindex/_search?scroll=1m' -d '
{
    "query": {
            "match_all" : {}
    }
}
'

That includes your first 'batch' of hits. But we are not done here. The output of the above curl command would be something like this:

{"_scroll_id":"c2Nhbjs1OzUyNjE6NU4tU3BrWi1UWkNIWVNBZW43bXV3Zzs1Mzc3OkhUQ0g3VGllU2FhemJVNlM5d2t0alE7NTI2Mjo1Ti1TcGtaLVRaQ0hZU0FlbjdtdXdnOzUzNzg6SFRDSDdUaWVTYWF6YlU2Uzl3a3RqUTs1MjYzOjVOLVNwa1otVFpDSFlTQWVuN211d2c7MTt0b3RhbF9oaXRzOjIyNjAxMzU3Ow==","took":109,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":22601357,"max_score":0.0,"hits":[]}}

It's important to have _scroll_id handy as next you should run the following command:

    curl -XGET  'localhost:9200/_search/scroll'  -d'
    {
        "scroll" : "1m", 
        "scroll_id" : "c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1" 
    }
    '

However, passing the scroll_id around is not something designed to be done manually. Your best bet is to write code to do it. e.g. in java:

    private TransportClient client = null;
    private Settings settings = ImmutableSettings.settingsBuilder()
                  .put(CLUSTER_NAME,"cluster-test").build();
    private SearchResponse scrollResp  = null;

    this.client = new TransportClient(settings);
    this.client.addTransportAddress(new InetSocketTransportAddress("ip", port));

    QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
    scrollResp = client.prepareSearch(index).setSearchType(SearchType.SCAN)
                 .setScroll(new TimeValue(60000))                            
                 .setQuery(queryBuilder)
                 .setSize(100).execute().actionGet();

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
                .setScroll(new TimeValue(timeVal))
                .execute()
                .actionGet();

Now LOOP on the last command use SearchResponse to extract the data.

WoodyDRN · Answer 8 · 2019-07-24T07:27:51.833

18

Elasticsearch will get significant slower if you just add some big number as size, one method to use to get all documents is using scan and scroll ids.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

In Elasticsearch v7.2, you do it like this:

POST /foo/_search?scroll=1m
{
    "size": 100,
    "query": {
        "match_all": {}
    }
}

The results from this would contain a _scroll_id which you have to query to get the next 100 chunk.

POST /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "<YOUR SCROLL ID>" 
}

edited Jul 24 '19 at 07:27

answered Nov 20 '15 at 15:53

WoodyDRN

1,221
20
26

1

This answer needs more updates. `search_type=scan` is now deprecated. So you should remove that, but then the behaviour has changed a little. The first batch of data comes back from the initial search call. The link you provide does show the correct way to do it. – Harry Wood Oct 25 '18 at 14:26
1

My comment was really to note that you can't just add any number as size, as it would be quite a lot slower. So I removed the code example and people can follow the link to get correct code. – WoodyDRN Oct 25 '18 at 14:30
4

@WoodyDRN It is better to have the code in your answer (even if it gets old) so it is still available when the link dies. – Trisped Jul 23 '19 at 22:13

score 12 · Answer 9 · answered Aug 18 '14 at 13:21

12

use server:9200/_stats also to get statistics about all your aliases.. like size and number of elements per alias, that's very useful and provides helpful information

answered Aug 18 '14 at 13:21

Oussama L.

1,842
6
25
31

2

But, from what I remember, ES only allow getting 16000 data per request. So if the data is above 16000, this solution is not enough. – Aminah Nuraini Apr 23 '16 at 22:48

score 7 · Answer 10 · answered Feb 19 '21 at 15:13

7

You actually don't need to pass a body to match_all, it can be done with a GET request to the following URL. This is the simplest form.

http://localhost:9200/foo/_search

answered Feb 19 '21 at 15:13

Kraken

5,043
3
25
46

score 6 · Answer 11 · edited Aug 10 '16 at 14:58

6

The best way to adjust the size is using size=number in front of the URL

Curl -XGET "http://localhost:9200/logstash-*/_search?size=50&pretty"

Note: maximum value which can be defined in this size is 10000. For any value above ten thousand it expects you to use scroll function which would minimise any chances of impacts to performance.

edited Aug 10 '16 at 14:58

Luca

1,588
2
22
26

answered Aug 10 '16 at 13:11

akshay misra

69
1
2

Since which version does max size occur? – WoodyDRN Feb 04 '17 at 01:10
That may be the "best" way up to a point, but a bit noddy really. If you have many thousands of records, then the best way is a "scroll" query. – Harry Wood Oct 25 '18 at 14:10
With the from and size-approach you will run into the Deep Pagination problem. Use the scroll API to make a dump of all documents. – Daniel Schneiter Feb 29 '20 at 12:10

score 6 · Answer 12 · answered Jun 16 '17 at 21:43

6

You can use the _count API to get the value for the size parameter:

http://localhost:9200/foo/_count?q=<your query>

Returns {count:X, ...}. Extract value 'X' and then do the actual query:

http://localhost:9200/foo/_search?q=<your query>&size=X

answered Jun 16 '17 at 21:43

Daniel

8,655
5
60
87

1

Setting the size to X like this, might have a surprising concurrency glitch: Consider what happens if a record is added in between doing the count and setting the size on your next query... but also if you have many thousands of records to get, then it's the wrong approach. Instead you should use a "scroll" query. – Harry Wood Oct 25 '18 at 14:31

Aminah Nuraini · Answer 13 · 2016-04-23T22:46:17.577

5

Simple! You can use size and from parameter!

http://localhost:9200/[your index name]/_search?size=1000&from=0

then you change the from gradually until you get all of the data.

edited Apr 23 '16 at 22:46

answered Dec 14 '15 at 10:29

Aminah Nuraini

18,120
8
90
108

4

never use this method if the data contains many documents... Each time you go to "the next page" Elastic will be slower and slower! Use SearchAfter instead – Joshlo Jul 20 '17 at 13:25
4

Also, this solution will not work if the overall data size is above 10 000. The option size=1000&from=10001 would fail. – iclman Jul 10 '18 at 14:51
3

Indeed fails. Parameters `from` + `size` can't be more than index.max_result_window index setting which defaults to 10,000 – stelios Aug 08 '18 at 10:13
1

If the data contains many thousands of documents, the correct answer is to use a 'scroll' query. – Harry Wood Oct 25 '18 at 14:22
With the `from` and `size`-approach you will run into the Deep Pagination problem. Use the scroll API to make a dump of all documents. – Daniel Schneiter Feb 29 '20 at 12:10

score 5 · Answer 14 · answered Feb 25 '19 at 07:45

5

From Kibana DevTools its:

GET my_index_name/_search
{
  "query": {
    "match_all": {}
  }
}

answered Feb 25 '19 at 07:45

belostoky

934
2
11
22

score 4 · Answer 15 · answered Apr 21 '17 at 10:03

4

http://localhost:9200/foo/_search/?size=1000&pretty=1

you will need to specify size query parameter as the default is 10

answered Apr 21 '17 at 10:03

Edwin O.

4,998
41
44

With the from and size-approach you will run into the Deep Pagination problem. Use the scroll API to make a dump of all documents. – Daniel Schneiter Feb 29 '20 at 12:10

score 4 · Answer 16 · answered Jan 25 '18 at 08:12

4

size param increases the hits displayed from from the default(10) to 500.

http://localhost:9200/[indexName]/_search?pretty=true&size=500&q=*:*

Change the from step by step to get all the data.

http://localhost:9200/[indexName]/_search?size=500&from=0

answered Jan 25 '18 at 08:12

Prasanna Jathan

570
7
12

asmaier · Answer 17 · 2019-05-02T13:26:03.143

4

A simple solution using the python package elasticsearch-dsl:

from elasticsearch_dsl import Search
from elasticsearch_dsl import connections

connections.create_connection(hosts=['localhost'])

s = Search(index="foo")
response = s.scan()

count = 0
for hit in response:
    # print(hit.to_dict())  # be careful, it will printout every hit in your index
    count += 1

print(count)

See also https://elasticsearch-dsl.readthedocs.io/en/latest/api.html#elasticsearch_dsl.Search.scan .

edited May 02 '19 at 13:26

answered May 02 '19 at 13:14

asmaier

11,132
11
76
103

this was very helpful - changed everything for me now i can actually hope to get results within the night. – Florian Heigl Feb 20 '22 at 21:14

score 4 · Answer 18 · answered Mar 14 '20 at 20:38

Using kibana console and my_index as the index to search the following can be contributed. Asking the index to only return 4 fields of the index, you can also add size to indicate how many documents that you want to be returned by the index. As of ES 7.6 you should use _source rather than filter it will respond faster.

GET /address/_search
 {
   "_source": ["streetaddress","city","state","postcode"],
   "size": 100,
   "query":{
   "match_all":{ }
    }   
 }

score 3 · Answer 19 · edited May 09 '18 at 13:40

3

For Elasticsearch 6.x

Request: GET /foo/_search?pretty=true

Response: In Hits-> total, give the count of the docs

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 1001,
        "max_score": 1,
        "hits": [
          {

edited May 09 '18 at 13:40

Sunder R

1,074
1
7
21

answered Apr 24 '18 at 16:13

Anurag

91
2

score 2 · Answer 20 · edited May 22 '18 at 09:53

2

curl -X GET 'localhost:9200/foo/_search?q=*&pretty'

edited May 22 '18 at 09:53

Stephen Kennedy

20,585
22
95
108

answered May 22 '18 at 09:49

Dhruv Sharma

25
3

score 2 · Answer 21 · answered Sep 28 '18 at 23:59

By default Elasticsearch return 10 records so size should be provided explicitly.

Add size with request to get desire number of records.

http://{host}:9200/{index_name}/_search?pretty=true&size=(number of records)

Note : Max page size can not be more than index.max_result_window index setting which defaults to 10,000.

score 1 · Answer 22 · answered Feb 14 '18 at 17:39

To return all records from all indices you can do:

curl -XGET http://35.195.120.21:9200/_all/_search?size=50&pretty

Output:

  "took" : 866,
  "timed_out" : false,
  "_shards" : {
    "total" : 25,
    "successful" : 25,
    "failed" : 0
  },
  "hits" : {
    "total" : 512034694,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "grafana-dash",
      "_type" : "dashboard",
      "_id" : "test",
      "_score" : 1.0,
       ...

score 1 · Answer 23 · answered Jul 24 '18 at 10:59

1

The maximum result which will return by elasticSearch is 10000 by providing the size

curl -XGET 'localhost:9200/index/type/_search?scroll=1m' -d '
{
   "size":10000,
   "query" : {
   "match_all" : {}
    }
}'

After that, you have to use Scroll API for getting the result and get the _scroll_id value and put this value in scroll_id

curl -XGET  'localhost:9200/_search/scroll'  -d'
{
   "scroll" : "1m", 
   "scroll_id" : "" 
}'

answered Jul 24 '18 at 10:59

RAHUL JAIN

11
2

The scroll API should be used right from the start with the very first request. – Daniel Schneiter Feb 29 '20 at 12:11

score 1 · Answer 24 · answered Aug 10 '18 at 06:51

If still someone is looking for all the data to be retrieved from Elasticsearch like me for some usecases, here is what I did. Moreover, all the data means, all the indexes and all the documents types. I'm using Elasticsearch 6.3

curl -X GET "localhost:9200/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'

Elasticsearch reference

christouandr7 · Answer 25 · 2018-12-11T10:42:45.640

1

The official documentation provides the answer to this question! you can find it here.

{
  "query": { "match_all": {} },
  "size": 1
}

You simply replace size (1) with the number of results you want to see!

edited Dec 11 '18 at 10:42

answered Dec 11 '18 at 10:36

christouandr7

169
2
12

1

The author of the question was asking for 'all' results, not a pre-defined amount of results. While it is helpful to post a link to the docs, the docs do not describe how to achieve that, neither does your answer. – Maarten00 Apr 26 '19 at 05:53
With the from and size-approach you will run into the Deep Pagination problem. Use the scroll API to make a dump of all documents. – Daniel Schneiter Feb 29 '20 at 12:11

score 1 · Answer 26 · answered Jun 01 '19 at 15:14

this is the query to accomplish what you want, (I am suggesting to use Kibana, as it helps to understand queries better)

GET my_index_name/my_type_name/_search
{
   "query":{
      "match_all":{}
   },
   size : 20,
   from : 3
}

to get all records you have to use "match_all" query.

size is the no of records you want to fetch (kind of limit). by default, ES will only return 10 records

from is like skip, skip first 3 records.

If you want to fetch exactly all the records, just use the value from the "total" field from the result once you hit this query from Kibana and the use it with "size".

The limitation of this query is that size + from must be lower or equal to "index.max_result_window". For large number of documents (by default 10000+) this query is not applicable. — KarelHusa, Nov 07 '19 at 13:26

score 0 · Answer 27 · answered Mar 27 '18 at 18:39

0

curl -XGET '{{IP/localhost}}:9200/{{Index name}}/{{type}}/_search?scroll=10m&pretty' -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
}}'

answered Mar 27 '18 at 18:39

aditya

1
1

While this code snippet may solve the question, [including an explanation](http://meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – Stamos Mar 27 '18 at 20:36

score 0 · Answer 28 · answered Aug 08 '18 at 21:29

None except @Akira Sendoh has answered how to actually get ALL docs. But even that solution crashes my ES 6.3 service without logs. The only thing that worked for me using the low-level elasticsearch-py library was through scan helper that uses scroll() api:

from elasticsearch.helpers import scan

doc_generator = scan(
    es_obj,
    query={"query": {"match_all": {}}},
    index="my-index",
)

# use the generator to iterate, dont try to make a list or you will get out of RAM
for doc in doc_generator:
    # use it somehow

However, the cleaner way nowadays seems to be through elasticsearch-dsl library, that offers more abstract, cleaner calls, e.g: http://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#hits

score 0 · Answer 29 · answered Mar 05 '20 at 16:17

0

Using Elasticsearch 7.5.1

http://${HOST}:9200/${INDEX}/_search?pretty=true&q=*:*&scroll=10m&size=5000

in case you can also specify the size of your array with &size=${number}

in case you don't know you index

http://${HOST}:9200/_cat/indices?v

answered Mar 05 '20 at 16:17

Tiago Medici

1,944
22
22

score -5 · Answer 30 · answered Jan 03 '17 at 11:16

-5

You can use size=0 this will return you all the documents example

curl -XGET 'localhost:9200/index/type/_search' -d '
{
   size:0,
   "query" : {
   "match_all" : {}
    }
}'

answered Jan 03 '17 at 11:16

premkumar

47
1
11

1

This will return an accumulated information, but not the hits themselves – user732456 Mar 30 '17 at 13:17

Elasticsearch query to return all records

30 Answers30

Linked

Related