1

So I've been trying to get a subset of a character vector for the last hour or so. In my (floundering) attempt to get this working I ran into an interesting characteristic of R. I have data (after JSON parsing) in the form of

[[1]]
[[1]]$business_id
[1] "rncjoVoEFUJGCUoC1JgnUA"

[[1]]$full_address
[1] "8466 W Peoria Ave\nSte 6\nPeoria, AZ 85345"

[[1]]$open
[1] TRUE

[[1]]$categories
[1] "Accountants"           "Professional Services" "Tax Services"         
[4] "Financial Services"   

[[1]]$city
[1] "Peoria"

[[1]]$review_count
[1] 3

[[1]]$name
[1] "Peoria Income Tax Service"

[[1]]$neighborhoods
list()

[[1]]$longitude
[1] -112.2416

[[1]]$state
[1] "AZ"

[[1]]$stars
[1] 5

[[1]]$latitude
[1] 33.58187

[[1]]$type
[1] "business"

Here's the code I'm using

#!/usr/bin/Rscript

require(graphics)
require(RJSONIO)

parsed_data <- lapply(readLines("yelp_phoenix_academic_dataset/yelp_academic_dataset_business.json"), fromJSON)

#parsed_data[,c("categories")]
print(parsed_data[1])

As I was trying to drop everything but the categories column I ran into this interesting behaviour

print(parsed_data[1])
print(parsed_data[1][1])
print(parsed_data[1][1][1][1][1][1])

All produce the same output (the one posted above). Why is that?

AlexLordThorsen
  • 8,057
  • 5
  • 48
  • 103
  • 2
    `[` returns a list of elements, and in this case a list of one element, so the first element of that is the same as what you had. Try `[[`: `print(parsed_data[[1]])` instead. – Matthew Lundberg May 24 '13 at 02:43
  • @MatthewLundberg I'm getting Error: unexpected '[[' in "[[". What does the double brackets do? – AlexLordThorsen May 24 '13 at 02:50

2 Answers2

4

This is the difference between [ and [[. It is hard to search for these online, but ?'[' will bring up the help.

When indexing a list with [, a list is returned:

list(a=1:10, b=11:20)[1]
## $a
##  [1]  1  2  3  4  5  6  7  8  9 10

This is a list of one element, so repeating the operation again results in the same value:

list(a=1:10, b=11:20)[1][1]
## $a
##  [1]  1  2  3  4  5  6  7  8  9 10

[[ returns the element, not a list containing the element. It also only accepts a single index (whereas [ accepts a vector):

list(a=1:10, b=11:20)[[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10

And this operation is not idempotent on lists:

list(a=1:10, b=11:20)[[1]][[1]]
## [1] 1
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
2

Your JSON data is currently stored in a list, rather than a vector, so the indexing is different.

As Matthew has pointed out, there is a difference between using [] to access an element and using [[]]. For a discussion on this I will refer you to this stack overflow thread:

In R, what is the difference between the [] and [[]] notations for accessing the elements of a list?

Looking at the data print out your data is stored as a nested list:

parsed_data[[1]]

Will give you a list containing each of the columns. To access the categories column you can use any of the following:

parsed_data[[1]][["categories"]]
parsed_data[[1]][[4]]
parsed_data[[1]]$categories

This will give you a vector of names as a you'd expect:

## [1] "Accountants"           "Professional Services" "Tax Services"
## [4] "Financial Services"  

Note that when accessing by index (either named or numeric) you still have to use the double bracket notation: [[]]. If you use [] instead, it will give you a list instead of a vector:

parsed_data[[1]]["categories"]
## [[1]]
##   [1] "Accountants"           "Professional Services" "Tax Services"
##   [4] "Financial Services"  
Community
  • 1
  • 1
Scott Ritchie
  • 10,293
  • 3
  • 28
  • 64