I am having trouble with JSON input behaving strangely and not letting me access individual elements properly, and an inability to strip off titles that appear in output. Below is the content, the behavior, and where I am getting stuck.
I pull in movie data from IMDB JSON format. At this point, the content is a single JSON record with a very flat structure that looks like this:
source URL: http://www.omdbapi.com/?t=lord+of+the+rings&y=&type=movie&plot=short&r=json&tomatoes=true
JSON content this produces:
{"Title":"The Lord of the Rings: The Fellowship of the Ring","Year":"2001","Rated":"PG-13","Released":"19 Dec 2001","Runtime":"178 min","Genre":"Adventure, Drama, Fantasy","Director":"Peter Jackson","Writer":"J.R.R. Tolkien (novel), Fran Walsh (screenplay), Philippa Boyens (screenplay), Peter Jackson (screenplay)","Actors":"Alan Howard, Noel Appleby, Sean Astin, Sala Baker","Plot":"A meek Hobbit from the Shire and eight companions set out on a journey to destroy the powerful One Ring and save Middle Earth from the Dark Lord Sauron.","Language":"English, Sindarin","Country":"New Zealand, USA","Awards":"Won 4 Oscars. Another 110 wins & 122 nominations.","Poster":"https://images-na.ssl-images-amazon.com/images/M/MV5BNmFmZDdkODMtNzUyMy00NzhhLWFjZmEtMGMzYjNhMDA1NTBkXkEyXkFqcGdeQXVyNDUyOTg3Njg@._V1_SX300.jpg","Metascore":"92","imdbRating":"8.8","imdbVotes":"1,292,127","imdbID":"tt0120737","Type":"movie","tomatoMeter":"91","tomatoImage":"certified","tomatoRating":"8.2","tomatoReviews":"225","tomatoFresh":"205","tomatoRotten":"20","tomatoConsensus":"Full of eye-popping special effects, and featuring a pitch-perfect cast, The Lord of the Rings: The Fellowship of the Ring brings J.R.R. Tolkien's classic to vivid life.","tomatoUserMeter":"95","tomatoUserRating":"4.1","tomatoUserReviews":"1353223","tomatoURL":"http://www.rottentomatoes.com/m/the_lord_of_the_rings_the_fellowship_of_the_ring/","DVD":"06 Aug 2002","BoxOffice":"$314,000,000.00","Production":"New Line Cinema","Website":"http://www.lordoftherings.net/film/trilogy/thefellowship.html","Response":"True"}
In the code that follows, I originally tried raw$Title, but this threw an error that you can't use "$" in an atomic vector. So I changed the code to use indexing for the elements I want as shown here:
library(RCurl)
library(RJSONIO)
movieURL <- "http://www.omdbapi.com/?t=lord+of+the+rings&y=&type=movie&plot=short&r=json&tomatoes=true"
fromurl <- function(finalurl) {
web <- getURL(finalurl)
rawContent <- fromJSON(web)
# see lecture 2 for temperature forecast example (fromurl {})
# research when raw$Title failed: https://stat.ethz.ch/pipermail/r-help/2008-November/179050.html
# more research: http://stackoverflow.com/questions/21567793/problems-reading-json-file-in-r
movie_name <- rawContent['Title']
movie_plot <- rawContent['Plot']
movie_awards <- rawContent['Awards']
result <- list(Title = movie_name, Plot = movie_plot, Awards = movie_awards)
names(result) <- c('Title', 'Plot', 'Awards')
return(result)
}
# tests on output
# class(out1)
# typeof(out1)
out1 <- fromurl(movieURL)
out1
The output from this currently looks like this:
$Title
Title
"The Lord of the Rings: The Fellowship of the Ring"
$Plot
Plot
"A meek Hobbit from the Shire and eight companions set out on a journey to destroy the powerful One Ring and save Middle Earth from the Dark Lord Sauron."
$Awards
Awards
"Won 4 Oscars. Another 110 wins & 122 nominations."
Note the way my final out1 list now has elements in it with $ headings that I can access individually (as in: out1$Title). But the output also has labels it got from JSON that say "Title", "Plot", and "Awards" that are not separate from the content. Asking for out1$title[1] brings both the "Title" line and the content line that tells you what movie it is.
How can I fix my code so that the original JSON labels are treated properly as content keys or subelement names and/or how to fix this so these extra labels are dropped?
I would like to fix this before I use this code to build something bigger where I ask for multiple records and create my final data frame. I probably missed something simple, but am new to this and am just not seeing it.
Updates:
This comment produces a viable solution to work around fixing the code shown in this post:
I would recommend jsonlite, rather than RJSONIO as I find it easier to work with:
rawContent <- jsonlite::fromJSON(movieURL)
– SymbolixAU 21 hours ago
However, as a purely academic exercise, if anyone can see how to fix what went wrong in my attempt to use RJSONIO, purely for the learning value I would still like to know.