I'm looking at reddit data from a particular subreddit, using the jsonlite package, and it appears that there's a parsing issue. Using the old reddit hyperlink, reading in the landing page of the subreddit works, but the error happens when I try to read in the second page, and the following pages. Here's the original code:
library(jsonlite)
page <- "https://old.reddit.com/r/Landlord/?count=25&after=t3_yl00x9" #second page
jsonlite::fromJSON(page)
Here's the subsequent error message:
Error in parse_con(txt, bigint_as_char) :
lexical error: invalid char in json text.
<!doctype html><html xmlns="htt
(right here) ------^
Referring to another post several years ago (link here) I've tried a few other solutions, but the original problem has persisted. Here's the sample code I've tried:
library(ndjson)
library(curl)
jsonlite::fromJSON(page)
jsonlite::stream_in(url(page))
ndjson::stream_in(page)
jsonlite::stream_in(curl(page))
And lastly, here's some of my session information, for reference:
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.2.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
I'm not too familiar with JSON and unstructured text data at this point, and I wonder if it's a simple error on my part. Any thoughts?? Thanks in advance.
Update:
As neilfws noted, the hyperlink that I had input wasn't JSON but HTML. I had forgotten to paste '.json' in the string. Here's the edited code that ran for me, below:
#string elements
base <- "https://old.reddit.com/r/Landlord/"
json <- ".json"
add <- "?count=25&after=t3_yn1aoo"
#concatenate strings
page <- paste0(base,json,add)
jsonlite::fromJSON(page)