2

I'm new to Elastic search and I'm trying to run a basic query in R. Because I need an API key I have not been able to use any of the available libraries for Elasticsearch in R.

I can retrieve all of the documents in the elastic search index but I don't seem to be able to run custom queries. I think it must be because my GET request isn't properly formatted. Here is what I have so far:

json_query <- jsonlite::toJSON('{
    "query": {
        "match" : {
            "LastName": "Baggins"
        }
    }
}
')

I've tried to add the my_query as a body= parameter but it just doesn't run the query (and retrieves the 10000 documents instead). I've ended up trying to paste it to the url parameter:

get_scroll_id <-  httr::GET(url =paste("'https://Myserver:9200/indexOfInterest/_search?scroll=1m&size=10000'",my_query),
                            encoding='json',
                            add_headers(.headers = c("Authorization" = "ApiKey ****", "Content-Type" = "application/json")),
                            config=httr::config(ssl_verifypeer = FALSE,ssl_verifyhost = FALSE))

scroll_data <- fromJSON(content(get_scroll_id, as="text"))

This gives me the error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Protocol "" not supported or disabled in libcurl

I have also tried to add the query as the query parameter as follows:

get_scroll_id <-  httr::GET(url ='https://Myserver:9200/indexOfInterest/_search?scroll=1m&size=10000',
                            query= json_query,
                            encoding='json',
                            add_headers(.headers = c("Authorization" = "ApiKey *****==", "Content-Type" = "application/json")),
                            verbose(),
                            config=httr::config(ssl_verifypeer = FALSE,ssl_verifyhost = FALSE))

This gives me the output:

GET https://Myserver:9200/indexOfInterest/_search?{
    "query": {
        "match" : {
            "LastName" : "Baggins"
        }
    }
}

Options:
* ssl_verifypeer: FALSE
* ssl_verifyhost: FALSE
* debugfunction: function (type, msg) 
{
    switch(type + 1, text = if (info) prefix_message("*  ", msg), headerIn = prefix_message("<- ", msg), headerOut = prefix_message("-> ", msg), dataIn = if (data_in) prefix_message("<<  ", msg, TRUE), dataOut = if (data_out) prefix_message(">> ", msg, TRUE), sslDataIn = if (ssl && data_in) prefix_message("*< ", msg, TRUE), sslDataOut = if (ssl && data_out) prefix_message("*> ", msg, TRUE))
}
* verbose: TRUE
Headers:
* Authorization: ApiKey *****==
* Content-Type: application/json

Looking at the Elasticsearch documentation the curl is as follows:

 curl 'localhost:9200/get-together/event/_search?pretty&scroll=1m' -d ' {
 "query": {
"match" : {
 "LastName" : "Baggins"
 }
 }
}'

How can I create the correct command for Elasticsearch?

halfer
  • 19,824
  • 17
  • 99
  • 186
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
  • Using `paste()` to build the URL doesn't seem correct. Just put the base URL in there and use `query=my_query` as a separate parameter (not the `body=` parameter) – MrFlick Mar 27 '21 at 20:02
  • I have now done @MrFLick this but I get the as follows: Error in curl::curl_fetch_memory(url, handle = handle) : URL using bad/illegal format or missing URL. I have updated the answer – Sebastian Zeki Mar 28 '21 at 07:51
  • Where exactly is `myquery` defined? Is that the same as `json_query`? So the curl command does show the `json_query` thing going to the body so you don't want to paste it to the URL. – MrFlick Mar 28 '21 at 16:10
  • json_query and myquery are the same. Sorry typo. I have corrected it. As per the second part of the question above I have removed the paste and it is only stated in the query parameter but the error still stands. – Sebastian Zeki Mar 28 '21 at 18:36
  • So what error exactly do you get with `httr::GET(url ="https://Myserver:9200/indexOfInterest/_search?scroll=1m&size=10000", encoding='json', add_headers(.headers = c("Authorization" = "ApiKey ****", "Content-Type" = "application/json")), body=my_query)`? – MrFlick Mar 28 '21 at 20:21
  • I don’t get an error if the parameter body=My_query is run. The problem is that the query isn’t run and it just returns 10000 documents from the index. – Sebastian Zeki Mar 31 '21 at 07:15
  • Have you been able to format a curl command to return the expected output? – the-mad-statter May 18 '21 at 15:59
  • How do I format the curl? I tried curl("https://Myserver:9200/indexOfInterest/_search?scroll=1m" -d ' { "query": { "match" : { "LastName" : "Baggins" } } }") but not sure how to deal with all the single and double quotes needed to pass the query – Sebastian Zeki May 18 '21 at 16:46
  • I am suggesting to perform a curl command from your system command line (outside of R). So something like your very last code example. This will help narrow the issue down to R vs not R. – the-mad-statter May 18 '21 at 18:25

3 Answers3

3

I think the problem here is, that the httr package simply doesn't support the body parameter, because it isn't common to use a body in a GET request (Check out this SO answer about HTTP GET with request body).

But you could also use a POST request here, that works for me. Try the following and see if it helps:

library(httr)
library(rjson)

my_query <- rjson::toJSON(
'{
   "query": {
     "match": {
       "LastName": "Baggins"
     }
   }
 }
'
)

response <- httr::POST(
  url = "https://Myserver:9200/indexOfInterest/_search",
  httr::add_headers(
    .headers = c(
      "Authorization" = "ApiKey ****", 
      "Content-Type" = "application/json"
    )
  ), 
  body = fromJSON(my_query)
)


data <- fromJSON(content(response, as="text"))

EDIT:

If you really insist on doing a GET request, try following using curl. I couldn't test the Authorization part, but the rest ist working:

library(curl)
library(jsonlite)

my_query <- toJSON(
'{
   "query": {
     "match": {
       "LastName": "Baggins"
     }
   }
 }
'
)

h <- new_handle(verbose = TRUE)
handle_setheaders(h,
   "Authorization" = "ApiKey ****", 
   "Content-Type" = "application/json"
)
handle_setopt(handle = h, postfields=fromJSON(my_query), customrequest="GET")

c <- curl_fetch_memory(url = "https://Myserver:9200/indexOfInterest/_search", handle=h)

prettify(rawToChar(c$content))

The trick here is to use the postfields param to pass the body. But that would trigger the curl library to do a POST request. So by using setting customrequest="GET" we explicitly tell him to use a GET request.

Chules
  • 426
  • 2
  • 8
1

You might also try the elastic library.

conn <- elastic::connect(host = "Myserver", 
                        path = "", 
                        user = "<username>",
                        pwd = "<password>",
                        port = 9200, 
                        transport_schema  = "https")
# conn$ping()

body <-'{
    "query": {
        "match" : {
            "LastName": "Baggins"
        }
    }
}
'
out <- elastic::Search(conn, index="indexOfInterest", body = body, size = 10000)

Then if you want to scroll to get more than 10000 entries (which is the max allowed by elastic for a single query).

# Scrolling
res <- elastic::Search(conn_cloud, index = 'indexOfInterest', time_scroll="5m",body = body, size = 10000)
out <- res$hits$hits
hits = 1
while(hits != 0){
  res <- elastic::scroll(conn, res$`_scroll_id`, time_scroll="5m")
  hits <- length(res$hits$hits)
  if(hits > 0)
    out <- c(out, res$hits$hits)
}
elastic::scroll_clear(conn_cloud, res$`_scroll_id`)

Note that Elastic does not recommend using scrolling and I've gotten slightly different results using it.

Jeff Parker
  • 1,809
  • 1
  • 18
  • 28
0

Perhaps the output of jsonlite::toJSON() is ignored because it will enclose your json in []s. What happens if you use rjson::toJSON() instead?

my_query <- rjson::toJSON(
'{
    "query": {
        "match" : {
            "LastName": "Baggins"
        }
    }
}'
)

httr::GET(
  url = "https://Myserver:9200/indexOfInterest/_search",
  query = list(scroll = "1m", size = "10000"), 
  encoding = 'json', 
  httr::add_headers(
    .headers = c(
      "Authorization" = "ApiKey ****", 
      "Content-Type" = "application/json"
      )
  ), 
  body = my_query
)
the-mad-statter
  • 5,650
  • 1
  • 10
  • 20