How to wait in R for a complete JSON before using it

Question

EDIT

I have a Shiny app where I need to make successive requests to an API. Every request needs the preceding one to be completed to prevent the not all data was parsed (0 chars were parsed out of a total of 160 chars error.

I use requests like fromJSON(content(GET('https://...')), as = "text")) with the httr package or curl_fetch_memory(fromJSON('https://...')) with the curl package. Unfortunately, when running successive JSON requests, I'm almost always getting the error that not all data was parsed. I'm obliged to pause the system with Sys.sleep(quite_a_long_time) to prevent the error. But this is of course not optimal.

How could I make sure the data is ready before going on ? I know there is some callback functions that can be implemented somewhere, but I'm not sure how to implement that.

EDIT: I'm using a GET request inside a function that is called in a lapply(). Maybe the problem arises because of successive new GET request as in a for loop ? Is there a need to make different sessions ?

Here is what I tried:

pgkbData <- reactive({
    
    # current dci filtered 
    #drugs_tmp <- dciCypInput()$x$data$Molécule # dciCypInput is made of a datatable containing drugs name
    #drugs_tmp <- dci_all_en %>% # dci_all_en is made of a dataframe containing french-english translated drug names
      filter(fr %in% drugs_tmp) %>% 
      pull(en)
    drugs_tmp <- c('acenocoumarol', 'tacrolimus') # for reproducible example
    drugs <- stri_trans_general(str = drugs_tmp, id = "Latin-ASCII") # removes accents
    drugs <- gsub(" ", "%20", drugs) # Convert white spaces to %20 to make it compatible with URL
    genes <- genoCypInput()$x$data$Enzyme

    if (length(drugs) == 0 | length(genes) == 0) {
      alert('...')
    }

    drug_id_json <- lapply(drugs, fn_pgkb_drug_id) # defined in separate .R file
    drug_ids <- sapply(drug_id_json, function(item) item$data[[1]]$id) # extract the ids

    gene_id_json <- lapply(genes, fn_pgkb_gene_id) # find genes IDs
    gene_ids <- sapply(gene_id_json, function(item) item$data[[1]]$id) # extract the ids

    drugs_genes <- expand_grid(drugs, genes)  # all drug-gene combinations
    drugsId_genesId <- expand_grid(drug_ids, gene_ids) # all drug-gene ids combinations
    final_df <- cbind(drugs_genes, drugsId_genesId)
    total <- nrow(final_df)
    
    for (source in c('cpic', 'dpwg')) { # look for CPIC and DPWG guidelines
      guidelines_json <- lapply(1:nrow(drugs_genes), # synthax to apply with multiple arguments
                                function(i) fn_pgkb_guideline(drug_id = drugsId_genesId$drug_ids[i],
                                                              drug = drugs_genes$drugs[i],
                                                              gene_id = drugsId_genesId$gene_ids[i],
                                                              gene = drugs_genes$genes[i],
                                                              source = source,
                                                              total = total,
                                                              i = i)) 
      guidelines_status <- sapply(guidelines_json,
                                  function(item) item$status) # e.g., 'fail', 'success',...
      guidelines_summary <- sapply(guidelines_json,
                                   function(item) item$data[[1]]$summaryMarkdown$html)
      guidelines_summary[sapply(guidelines_summary, function(x) length(x) == 0L)] <- NA # Replace NULL values with NA to keep a 6-element vector (otherwise NULL are dropped with unlist())

      # urls <- sapply(guidelines_json, function(item) item$data[[1]]$crossReferences[[1]]$resourceId)
      urls <- sapply(guidelines_json, function(item) {
        if (length(item$data[[1]]$crossReferences) > 0) {
          return(item$data[[1]]$crossReferences[[1]]$resourceId)
        } else {
          return(NA)
        }
      })
      
      # guidelines_url <- lapply(urls, function(url) HTML(paste0("<a href='", url, "' target='_blank'>Voir recommendation</a>")))
      guidelines_url <- lapply(urls, function(url) {
        if (!is.na(url)) {
          link_text <- "Voir recommandation"
          link_html <- HTML(paste0("<a href='", url, "' target='_blank'>", link_text, "</a>"))
          return(link_html)
        }
        else {
          return("")
        }
      })
      
      final_df <- final_df %>% 
        cbind(status = guidelines_status,
              source = unlist(guidelines_summary),
              lien = unlist(guidelines_url))
    }
    names(final_df)[5] = "status_CPIC"
    names(final_df)[6] = "CPIC"
    names(final_df)[7] = "lien (CPIC)"
    names(final_df)[8] = "status_DPWG"
    names(final_df)[9] = "DPWG"
    names(final_df)[10] = "lien (DPWG)"
    final_df <- final_df %>% 
      filter(status_CPIC == 'success' | status_DPWG == 'success')
    datatable(final_df,
              #extensions = c("Buttons"), 
              rownames = FALSE,
              filter = 'top', 
              selection = "none",
              editable = FALSE,
              escape = FALSE,
              options = list(
                #   search = list(regex = TRUE, caseInsensitive = TRUE),
                pageLength = 20,
                dom = 'lftipr'
                #   buttons = list(list(extend = "copy", text = '<span class="fa-solid fa-copy"></span> Copier', title = ''))
                #
              )
    )
  })

custom methods in separate .R file:

pgkb_get_prefix <- 'https://api.pharmgkb.org/v1/data/'

pgkb_drugId_url <- function(chemical_name) {
  url <- paste0(pgkb_get_prefix, 'chemical?name=', chemical_name)
  return(url)
}

fn_pgkb_drug_id <- function(chemical_name) {
  pgkb_drug_id <- content(GET(paste0(pgkb_get_prefix,
                                     'chemical?name=',
                                     chemical_name)))
  short_notification(duration = 0.8, message = paste0('ID (', chemical_name, ') (1/3)'))
  Sys.sleep(0.8)
  return(pgkb_drug_id)
}

fn_pgkb_gene_id <- function(hgng_gene) {
  pgkb_gene_id <- content(GET(paste0(pgkb_get_prefix,
                                     'gene?symbol=',
                                     hgng_gene)))
  short_notification(duration = 0.8, message = paste0('ID (', hgng_gene, ') (2/3)'))
  Sys.sleep(0.8)
  return(pgkb_gene_id)
}

fn_pgkb_guideline <- function(drug_id, drug, gene_id, gene, source, total, i) {
  result <- content(GET(paste0(pgkb_get_prefix,
                               'guidelineAnnotation?source=',
                               source,
                               '&relatedChemicals.accessionId=',
                               drug_id,
                               '&relatedGenes.accessionId=',
                               gene_id)))
  
  Sys.sleep(1)
  status <- result$status
  short_notification(duration = 1, message = paste0(drug, ' / ', gene, ' (', source, ') '))
  short_notification(duration = 1, message = paste0('Status: ', status, ' (', i, '/', total, ')'))
  # if (status == "fail") {
  #   error <- result$data$errors[[1]]$message
  #   print(paste('error =', error))
  # }

  return(result)
}

I read on the API documentation that only 2 requests per second are allowed (this is probably the reason why I have sometimes no data back - if more than 2 requests reach the API within one second - and why I will have to pause the system). The idea would be to pause for 0.5 seconds once data from the previous request is available to make sure I use Sys.sleep() for the minimal time that would be necessary.

if you could share some of your code, that would be great! :-) — Mark, Aug 17 '23 at 08:33
Please make this question *reproducible*. This includes sample code you've attempted (including listing non-base R packages, and any errors/warnings received), sample *unambiguous* data (e.g., `data.frame(x=...,y=...)` or the output from `dput(head(x))` into a [code block]) and/or inputs/controls for using the shiny app, and intended output given that input. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. — r2evans, Aug 17 '23 at 12:44
Thank you all. Sorry for not getting back earlier. I tried to provide a minimal reproducible example, hoping I did not forget something. I noticed that requests are limited to 2 per second, meaning I will have to pause the system for 0.5 seconds after I receive data back I suppose ? But knowing when data is received would allow me to know when pausing for 0.5, am I right ? Thanks — Trichophyton, Aug 28 '23 at 15:00

score 1 · Answer 1 · answered Aug 18 '23 at 00:28

Maybe with curl_multi_fetch:

library(curl)

pool <- new_pool()

# callback
cb <- function(req){cat("done:", req$url, ": HTTP:", req$status, "\n")}

# multi fetch
curl_fetch_multi('https://www.google.com', done = cb, pool = pool)
curl_fetch_multi('https://cloud.r-project.org', done = cb, pool = pool)
curl_fetch_multi('https://hb.cran.dev/blabla', done = cb, pool = pool)

out <- multi_run(pool = pool)
# done: https://cloud.r-project.org/ : HTTP: 200 
# done: https://www.google.com/ : HTTP: 200 
# done: https://hb.cran.dev/blabla : HTTP: 404

How to wait in R for a complete JSON before using it

1 Answers1