2

I have written a function to retrieve data via API. The output format is JSON. https://jsoneditoronline.org/?id=ac0ec7ececae49ca92599ff912458a84

With every query a variable (path) should change. This variable is located in a dataframe (product_folders_summarised) in the column (product_folder).

library(tidyverse)
library(httr)
library(jsonlite)
library(data.table)

func_visibility <- function(product_folder) {
  api_url <- "https://api.where-the-data-comes-from.com/example"
  api_key <- "_API_KEY_"
  format <- "json"
  request <-
    fromJSON(
      paste0(api_url, "?api_key=",api_key,"&format=",format,"&path=",product_folder),
      simplifyVector = TRUE,
      simplifyDataFrame = TRUE,
      flatten = TRUE
    )
  request <- lapply(request, function(x) {
    x[sapply(x, is.null)] <- NA
    unlist(x)
  })
  request <- as.data.frame(t(request$answer))
  request <- select(request, -sichtbarkeitsindex.path, -sichtbarkeitsindex.date)
  return(request) 
}

product_folders_summarised <- product_folders_summarised %>%
  dplyr::mutate(visibility_value = func_visibility(product_folder))

The dataframe is structured as follows:

|product_folder|value_1|value_2|
|https://www.example.de/folder/|this|that|
|https://www.example.de/anotherfolder/|...|...|

I expect that from the dataframe (product_folders_summarised) the value is taken from the column (product_folder), passed to the function and visibility_value is added as column.

Instead I get the error message

Error: lexical error: invalid char in json text.
                                       https://api.https://api.where-the-data-comes-from.com/example.
                     (right here) ------^

I have now adjusted my function as suggested by r2evans.

func_visibility <- function(path) {
  api_url <- "https://api.where-the-data-comes-from.com/example"
  api_key <- "_API_KEY_"
  format <- "json"
  request <- paste0(api_url,"?api_key=",api_key,"&format=",format,"&path=",path)
  request <- lapply(request, jsonlite::fromJSON)
  request <- lapply(request, function(x) {
    x[sapply(x, is.null)] <- NA
    as.data.frame(t(x))  
    unlist(x)
  })
  return(request)
}
product_folders_summarised_short <- product_folders_summarised_short %>%
  dplyr::mutate(sichtbarkeitsindex_value = func_visibility(product_folder))

The data is now retrieved from the API. The data is written into the new last column of the dataframe:

c(method = "domain.sichtbarkeitsindex", answer.sichtbarkeitsindex.path = "https://www.example.de/folder/", answer.sichtbarkeitsindex.date = "2019-09-02T00:00:00+02:00", answer.sichtbarkeitsindex.value = "0", credits.used = "1")

In my first attempt (see first codeblock), I converted the data into a dataframe.

request <- as.data.frame(t(request$answer)),
request <- select(request, -sichtbarkeitsindex.path, -sichtbarkeitsindex.date),

Applied to a single URL, this worked. Now I integrated

`as.data.frame(t(x))`,

but I only get the result that the data from the API is stored as a character vector.

Do you think it is easier to write the data as a character vector in the last column of the dataframe, in order to assign the vectors to a new dataframe after passing the first function with another function?

Olaf Pleines
  • 41
  • 1
  • 7
  • 1
    Can you confirm what version of the `jsonlite` package you are using with `sessionInfo()`? Seems to be a duplicate of this question: https://stackoverflow.com/questions/49173967/trouble-using-jsonlites-fromjson-with-url-in-r Probably just using a "bad" version of jsonlite. – MrFlick Aug 29 '19 at 15:32
  • `jsonlite::fromJSON` does not like vectors as input, it is silently combining all of the vector with `paste(txt, collapse = "\n")` (in `jsonlite:::parse_string`), which is causing a vector of individually-valid json strings to become invalid. Consider using `jsonlite::stream_in(textConnection(...), simplifyDataFrame=FALSE` instead, though it returns a `list`. Or iterate over each element using `sapply`/`lapply` and `jsonlite::fromJSON`. – r2evans Aug 29 '19 at 16:00
  • I am using the version jsonlite_1.6 There is no more recent version available :-) – Olaf Pleines Aug 29 '19 at 18:47

2 Answers2

4

If you trace down into how jsonlite::fromJSON deals with its arguments, you'll see that it calls:

  • jsonlite::fromJSON
    • jsonlite:::parse_and_simplify
      • jsonlite:::parseJSON
        • jsonlite:::parse_string

(note that most are internal, not exported) ... and that last function is

function (txt, bigint_as_char) 
{
    if (length(txt) > 1) {
        txt <- paste(txt, collapse = "\n")
    }
    .Call(R_parse, txt, bigint_as_char)
}

Which means your vector of json strings is being collapsed into length 1 using \n (which seems odd to me ...). Because of this, a vector of valid JSON is effectively turning into ndjson (newline-delimited json), which fromJSON does not do.

Two options:

  1. Do the json parsing inside your lapply.

    jsonvec <- c('{"a":1}', '{"b":2}')
    lapply(jsonvec, jsonlite::fromJSON)
    # [[1]]
    # [[1]]$a
    # [1] 1
    # [[2]]
    # [[2]]$b
    # [1] 2
    
  2. Use jsonlite::stream_in (which does do ndjson) and disable simplification:

    jsonvec <- c('{"a":1}', '{"b":2}')
    jsonlite::stream_in(textConnection(jsonvec), simplifyDataFrame = FALSE)
    #  Imported 2 records. Simplifying...
    # [[1]]
    # [[1]]$a
    # [1] 1
    # [[2]]
    # [[2]]$b
    # [1] 2
    
  3. Use Vectorize to turn a non-vector-friendly function into a vector-friendly one.

    jsonvec <- c('{"a":1}', '{"b":2}')
    Vectorize(jsonlite::fromJSON, USE.NAMES=FALSE)(jsonvec)
    # $a
    # [1] 1
    # $b
    # [1] 2
    

    This could allow you to just replace any instance of fromJSON in your code with Vectorize(fromJSON), noting that it returns a function that you then use on your vector.

Given that you are relying on fromJSON to download the data, I suggest the first or third solutions.

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Excuse me, I don't understand the approach. My problem seems to be that I cannot pass the string contained in the dataframe (product_folders_summarised) in the column (product_folder) to my function via `dplyr::mutate`. If I insert a single variable from the column (product_folder) into my function, e.g. . `func_visibility("https://www.example.com")` I get the desired result. – Olaf Pleines Aug 29 '19 at 19:02
  • 1
    Yes, because `fromJSON` *does not take a vector*. When you apply it to one cell's worth (URL or json text), then it works fine. The moment you give it a *vector*, it will fail. You need to either (1) `lapply(frame_column, fromJSON)` or (2) use `stream_in` (which only works if the json data has already been loaded). – r2evans Aug 29 '19 at 19:20
  • Thank you for your effort. I don't get it to work. I will take the XML from the API and do without JSON. – Olaf Pleines Aug 29 '19 at 20:32
  • Can you replace `fromJSON(` (in your code above) with `Vectorize(fromJSON)(`? (Notice the parens, that's intentional. `Vectorize` returns a `function`.) (See my edit.) – r2evans Aug 29 '19 at 20:37
  • Thank you for your patience. I just won't let go of getting to the bottom of this. – Olaf Pleines Aug 30 '19 at 15:00
  • I have tried to use your two options, but I have only partially come to a better result. – Olaf Pleines Aug 30 '19 at 15:04
0

I have now taken a different approach. The solution is to use rowwise() %>%. Otherwise the whole column will be used as vector.

library(tidyverse)
library(httr)
library(jsonlite)

func_sichtbarkeit <- function(pfad, output) {
  httr::GET(
    url = "https://api.where-the-data-comes-from.com/example",
    query = list(
      api_key = "_API_KEY_",
      format = "json",
      country ="de",
      date = "2019-09-08",
      daily = "true",
      path = pfad
    )
  ) -> res
  httr::warn_for_status(res) # Prüfen auf Status 200 und bei Bedarf warnen
  out <- httr::content(res, as = "text", encoding = "UTF-8") # Abrufen der Daten als text und URF-8 encodiert
  out <- jsonlite::fromJSON(out) # JSON parsen
  answer_raw <- out$answer$sichtbarkeitsindex # Zutreffenden Datensatz aus dem JSON auswählen
  visibility_raw <- answer_raw[[1]][["value"]] # Reduktion auf SI Wert
  return(visibility_raw)
}

all_search_2019_09_08 <- all_search %>%
  rowwise() %>%
  dplyr::mutate(visibility_value_2019_09_08 = func_sichtbarkeit(address))
Olaf Pleines
  • 41
  • 1
  • 7