R using RJSONIO - strange trouble parsing simple flat json structure on IMDB

Question

I am having trouble with JSON input behaving strangely and not letting me access individual elements properly, and an inability to strip off titles that appear in output. Below is the content, the behavior, and where I am getting stuck.

I pull in movie data from IMDB JSON format. At this point, the content is a single JSON record with a very flat structure that looks like this:

source URL: http://www.omdbapi.com/?t=lord+of+the+rings&y=&type=movie&plot=short&r=json&tomatoes=true

JSON content this produces:

{"Title":"The Lord of the Rings: The Fellowship of the Ring","Year":"2001","Rated":"PG-13","Released":"19 Dec 2001","Runtime":"178 min","Genre":"Adventure, Drama, Fantasy","Director":"Peter Jackson","Writer":"J.R.R. Tolkien (novel), Fran Walsh (screenplay), Philippa Boyens (screenplay), Peter Jackson (screenplay)","Actors":"Alan Howard, Noel Appleby, Sean Astin, Sala Baker","Plot":"A meek Hobbit from the Shire and eight companions set out on a journey to destroy the powerful One Ring and save Middle Earth from the Dark Lord Sauron.","Language":"English, Sindarin","Country":"New Zealand, USA","Awards":"Won 4 Oscars. Another 110 wins & 122 nominations.","Poster":"https://images-na.ssl-images-amazon.com/images/M/MV5BNmFmZDdkODMtNzUyMy00NzhhLWFjZmEtMGMzYjNhMDA1NTBkXkEyXkFqcGdeQXVyNDUyOTg3Njg@._V1_SX300.jpg","Metascore":"92","imdbRating":"8.8","imdbVotes":"1,292,127","imdbID":"tt0120737","Type":"movie","tomatoMeter":"91","tomatoImage":"certified","tomatoRating":"8.2","tomatoReviews":"225","tomatoFresh":"205","tomatoRotten":"20","tomatoConsensus":"Full of eye-popping special effects, and featuring a pitch-perfect cast, The Lord of the Rings: The Fellowship of the Ring brings J.R.R. Tolkien's classic to vivid life.","tomatoUserMeter":"95","tomatoUserRating":"4.1","tomatoUserReviews":"1353223","tomatoURL":"http://www.rottentomatoes.com/m/the_lord_of_the_rings_the_fellowship_of_the_ring/","DVD":"06 Aug 2002","BoxOffice":"$314,000,000.00","Production":"New Line Cinema","Website":"http://www.lordoftherings.net/film/trilogy/thefellowship.html","Response":"True"}

In the code that follows, I originally tried raw$Title, but this threw an error that you can't use "$" in an atomic vector. So I changed the code to use indexing for the elements I want as shown here:

library(RCurl)
library(RJSONIO)

movieURL <- "http://www.omdbapi.com/?t=lord+of+the+rings&y=&type=movie&plot=short&r=json&tomatoes=true"

fromurl <- function(finalurl) {
  web        <- getURL(finalurl)
  rawContent <- fromJSON(web)

  # see lecture 2 for temperature forecast example (fromurl {})
  # research when raw$Title failed: https://stat.ethz.ch/pipermail/r-help/2008-November/179050.html
  # more research: http://stackoverflow.com/questions/21567793/problems-reading-json-file-in-r

  movie_name <- rawContent['Title']
  movie_plot <- rawContent['Plot']
  movie_awards <- rawContent['Awards']

  result <- list(Title = movie_name, Plot = movie_plot, Awards = movie_awards)
  names(result) <- c('Title', 'Plot', 'Awards')
  return(result)
}

# tests on output
# class(out1)
# typeof(out1)

out1 <- fromurl(movieURL)
out1

The output from this currently looks like this:

$Title
                                              Title 
"The Lord of the Rings: The Fellowship of the Ring" 

$Plot


                                              Plot 
"A meek Hobbit from the Shire and eight companions set out on a journey to destroy the powerful One Ring and save Middle Earth from the Dark Lord Sauron." 

$Awards
                                             Awards 
"Won 4 Oscars. Another 110 wins & 122 nominations."

Note the way my final out1 list now has elements in it with $ headings that I can access individually (as in: out1$Title). But the output also has labels it got from JSON that say "Title", "Plot", and "Awards" that are not separate from the content. Asking for out1$title[1] brings both the "Title" line and the content line that tells you what movie it is.

How can I fix my code so that the original JSON labels are treated properly as content keys or subelement names and/or how to fix this so these extra labels are dropped?

I would like to fix this before I use this code to build something bigger where I ask for multiple records and create my final data frame. I probably missed something simple, but am new to this and am just not seeing it.

Updates:

This comment produces a viable solution to work around fixing the code shown in this post:

I would recommend jsonlite, rather than RJSONIO as I find it easier to work with: rawContent <- jsonlite::fromJSON(movieURL) – SymbolixAU 21 hours ago

However, as a purely academic exercise, if anyone can see how to fix what went wrong in my attempt to use RJSONIO, purely for the learning value I would still like to know.

I would recommend `jsonlite`, rather than `RJSONIO` as I find it easier to work with : `rawContent <- jsonlite::fromJSON(movieURL)` — SymbolixAU, Feb 26 '17 at 21:35
@halfer liked this post enough to take the time to edit it. You gave me a useful answer. But because no one upvoted it or posted their comments as an answer, the 0 on this question, is being combined with other zeroes and I may lose the ability to ask questions soon. This scenario (people liking my questions enough to ineract with them) but not giving any votes or real answers may lock me out of contributing soon (according to a site warning I got). Were you aware the system worked this way? — TMWP, Mar 14 '17 at 12:55

R using RJSONIO - strange trouble parsing simple flat json structure on IMDB

0 Answers0