0

I have a df of unique ids x urls.

library (httr)

for (i in (1:nrow(df))) {
  resp <- httr::GET(df$url[i])
  httpcode[i] <- status_code(resp)
  httpstatus[i] <- http_status(resp)$reason
}

I want to (a) find the status_code for every url, (b) find the http_status for every url, and (c) spit them out into new columns in the same df.

Problems: 1. In the code below, when I replace i by an actual index number (e.g. i = 1), the code works. When I put it in a for loop, it gives me the following error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Couldn't resolve host name
  1. How do I make httpcode and httpstatus convert from objects into new columns in the same df? Thanks
Susie
  • 9
  • 1
  • 5
  • Oh great thanks, this resolved the first issue. (but I also had to remove the i index from httpcode[i] and httpstatus[i]. Any idea how I can return the results of the respective url into two new columns in the same df? – Susie Aug 02 '17 at 23:16

2 Answers2

0
out_df <- data.frame()
for (i in df$url) {
  print(i)
  resp <- httr::GET(i)
  httpcode <- status_code(resp)
  httpstatus <- http_status(resp)$reason
  row <- c(i, httpcode, httpstatus)
  out_df <- rbind(out_df, row)
}

df <- merge(df, out_df, by = 'url', all.x = TRUE)
AidanGawronski
  • 2,055
  • 1
  • 14
  • 24
  • print(i) is not necessary but will help you identify the faulty urls. – AidanGawronski Aug 02 '17 at 23:19
  • Thanks Aidan. I added a picture of the result in answer above, and now I found that I have NAs because of this error: 6: In `[<-.factor`(`*tmp*`, ri, value = "www.[website name hidden].com") : invalid factor level, NA generated --- Any idea how to fix this? – Susie Aug 02 '17 at 23:32
  • df$url <- as.character(df$url) – AidanGawronski Aug 03 '17 at 01:29
  • More importantly make a reproducible example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – AidanGawronski Aug 03 '17 at 01:30
0

Here is a slightly different approach.

First, use a function to obtain the status code and the status message. Then use map_df, of the purrr package, to create a dataframe with the urls, status code and message. I use the HEAD() function, from httr package, since all the info it's on the header.

library(purrr)

## Example dataframe with a column for id and urls
urls_df <- data.frame(id = c(1, 2), 
                  urls = c("https://www.google.gr", "https://www.google.es"), 
                  stringsAsFactors = FALSE)

#function to get the status code and status message
status_fun <- function(my_url) {
   http_head <- HEAD(my_url)
   status_code_only = http_head$status_code
   message = http_status(http_head)$message
   data.frame(url = my_url, status_code = status_code_only, message = message)
}

# create a dataframe with the urls, status code and status message
df.new <- map_df(urls_df$urls, status_fun)

#merge the new dataframe with original 
df.final <- merge(urls_df, df.new, by = 'url', all.x = TRUE)

Hope that helps!

yiah
  • 81
  • 7