I am looking at this great answer: https://stackoverflow.com/a/58211397/3502164.
The beginning of the solution includes:
library(httr)
library(xml2)
gr <- GET("https://nzffdms.niwa.co.nz/search")
doc <- read_html(content(gr, "text"))
xml_attr(xml_find_all(doc, ".//input[@name='search[_csrf_token]']"), "value")
Output is constant across multiple requests:
"59243d3a2....61f8f73136118f9"
My Default way so far would have been:
doc <- read_html("https://nzffdms.niwa.co.nz/search")
xml_attr(xml_find_all(doc, ".//input[@name='search[_csrf_token]']"), "value")
That results differs to the Output above and changes across multiple requests.
Question:
What is the difference in between:
read_html(url)
read_html(content(GET(url), "text"))
Why does it result in different values and why does only the "GET" solution Returns the csv in the linked question?
(I hope its ok to structure it in Kind of three Sub Questions).
What i tried:
Going down the Rabbit hole of function calls:
read_html
(ms <- methods("read_html"))
getAnywhere(ms[1])
xml2:::read_html
xml2:::read_html.default
#xml2:::read_html.response
read_xml
(ms <- methods("read_xml"))
getAnywhere(ms[1])
But that resulted in this Question: Find the used method for R wrapper functions
Thoughts:
I dont see that the get request takes any headers or Cookies, that could explain different Responses.
From my understanding both
read_html
andread_html(content(GET(.), "text"))
will return XML/html.Ok, here i am not sure if it makes sense to check, but because i ran out of ideas: I checked if there is some Kind of Caching going on.
Code:
with_verbose(GET("https://nzffdms.niwa.co.nz/search"))
....
<- Expires: Thu, 19 Nov 1981 08:52:00 GMT
<- Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
--> Does not look to me like Caching might be the solution.
- Looking at
help("GET")
gives an interesting section concerning a "conditional GET":
The semantics of the GET method change to a "conditional GET" if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field. A conditional GET method requests that the entity be transferred only under the circumstances described by the conditional header field(s). The conditional GET method is intended to reduce unnecessary network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring data already held by the client.
But as far as i see with with_verbose()
None of If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range
are set.