2

I’m using the R programming language (and R Studio) having trouble organizing some data that I’m pulling via API so that it’s writeable to a table. I’m using the StubHub API to get a JSON response that contains all ticket listings for a particular event. I can successfully make the call to StubHub, I get the successful response. Here’s the code I am using to grab the response:

# get the content part of the response
msgContent = content(response)

# format to JSON object
jsonContent = jsonlite::fromJSON(toJSON(msgContent),flatten=TRUE,simplifyVector=TRUE)

This JSON object has a node called “listing” and that’s what I’m most interested in, so I set a variable to that part of the object:

friListings = jsonContent $listing

Checking the class of “friListings” I see I have a data.frame:

> class(friListings)
[1] "data.frame"

When I click on this variable in R Studio — View(friListings) — it opens in a new tab and looks pretty and nicely formatted. There are 21 variables (columns) and 609 observations (row). I see null values for certain cells, which is expected.

I would like to write this data.frame out as a table in a file on my computer. When I try to do that, I get this error.

> write.table(friListings,file="data",row.names=FALSE)
Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) : 
  missing value where TRUE/FALSE needed

Looking at other postings, it appears this is happening because my data.frame is actually not “flat” and is a list of lists with different classes and nesting. I validate this by str() on each of the columns in friListings….

> str(friListings[1])
'data.frame':   609 obs. of  1 variable:
 $ listingId:List of 609
  ..$ : int 1138579989
  ..$ : int 1138969061
  ..$ : int 1138958138
(this is just the first couple of lines, there are hundreds)

Another example:

> str(friListings[6])
'data.frame':   609 obs. of  1 variable:
$ sellerSectionName:List of 609
..$ : chr "Upper 354 - No View"
..$ : chr "Club 303 - Obstructed/No View"
..$ : chr "Middle 254 - Obstructed/No View"
(this is just the first couple of lines, there are hundreds)

Here is the head of friListings that I am attempting to share using dput from the reproducible example post:

> dput(head(friListings,4))
structure(list(listingId = list(1138579989L, 1138969061L, 1138958138L, 
1139003985L), sectionId = list(1552295L, 1552172L, 1552220L, 
1552289L), row = list("16", "6", "22", "26"), quantity = list(
1L, 2L, 4L, 1L), sellerSectionName = list("Upper 354 - No View", 
"Club 303 - Obstructed/No View", "Middle 254 - Obstructed/No View", 
"353"), sectionName = list("Upper 354 - Obstructed/No View", 
"Club 303 - Obstructed/No View", "Middle 254 - Obstructed/No View", 
"Upper 353 - Obstructed/No View"), seatNumbers = list("21", 
"7,8", "13,14,15,16", "General Admission"), zoneId = list(
232917L, 232909L, 232914L, 232917L), zoneName = list("Upper", 
"Club", "Middle", "Upper"), listingAttributeList = list(structure(c(204L, 
201L), .Dim = c(2L, 1L)), structure(c(4369L, 5370L), .Dim = c(2L, 
1L)), structure(c(4369L, 5989L), .Dim = c(2L, 1L)), structure(c(204L, 
4369L), .Dim = c(2L, 1L))), listingAttributeCategoryList = list(
structure(1L, .Dim = c(1L, 1L)), structure(1L, .Dim = c(1L, 
1L)), structure(1L, .Dim = c(1L, 1L)), structure(1L, .Dim = c(1L, 
1L))), deliveryTypeList = list(structure(5L, .Dim = c(1L, 
1L)), structure(5L, .Dim = c(1L, 1L)), structure(5L, .Dim = c(1L, 
1L)), structure(5L, .Dim = c(1L, 1L))), dirtyTicketInd = list(
FALSE, FALSE, FALSE, FALSE), splitOption = list("0", "0", 
"1", "1"), ticketSplit = list("1", "2", "2", "1"), splitVector = list(
structure(1L, .Dim = c(1L, 1L)), structure(2L, .Dim = c(1L, 
1L)), structure(c(2L, 4L), .Dim = c(2L, 1L)), structure(1L, .Dim = c(1L, 
1L))), sellerOwnInd = list(0L, 0L, 0L, 0L), currentPrice.amount = list(
468.99, 475L, 475L, 550.45), currentPrice.currency = list(
"USD", "USD", "USD", "USD"), faceValue.amount = list(NULL, 
NULL, NULL, NULL), faceValue.currency = list(NULL, NULL, 
NULL, NULL)), .Names = c("listingId", "sectionId", "row", 
"quantity", "sellerSectionName", "sectionName", "seatNumbers", 
"zoneId", "zoneName", "listingAttributeList", "listingAttributeCategoryList", 
"deliveryTypeList", "dirtyTicketInd", "splitOption", "ticketSplit", 
"splitVector", "sellerOwnInd", "currentPrice.amount", "currentPrice.currency", 
"faceValue.amount", "faceValue.currency"), row.names = c(NA, 
4L), class = "data.frame")

I tried to get around this by going through each column in friListings, unlisting that node, saving to a vector and then doing a cbind to stitch them all together. But, when I do that, I get vectors of different lengths because of the nulls. I took this approach one step further and tried to class each column to force NAs to preserve the nulls, but that’s not working. And, regardless, there’s gotta be a better approach than this. Here's some output to illustrate what happens when I attempt this approach.

# Take the column zoneId and casting it as numeric to force NA
friListings$zoneId<-lapply(friListings$zoneId, as.numeric)

# check the length
> length(friListings$zoneId)
[1] 609

# unlist and check the length... and I lost 11 items
> zoneid <- unlist(friListings$zoneId, use.names=FALSE)
> length(zoneid)
[1] 598

# here's the tail of the column... (because I happen to know that's where the empty values that are being dropped are)
> tail(friListings$zoneId)
[[1]]
numeric(0)

[[2]]
numeric(0)

[[3]]
numeric(0)

[[4]]
numeric(0)

[[5]]
numeric(0)

[[6]]
numeric(0)

I know people work with JSON and R all the time (I'm obviously not one of those people!), so maybe I’m missing something obvious. But I’ve spent 5 hours trying different ways to clean this data and searching the internet for answers. I read the JSON package documentation, too.

I really just want to "flatten" this object so that it’s pretty and structured in the same way the R Studio renders it when I do View(friListings). I'm already passing "flatten=TRUE" in my "fromJSON" call above and it doesn't seem to be doing what I expect. Same with the "simplifyVector=TRUE" (which is TRUE by default according to the docs, but added it for clarity).

Thanks for any insight or guidance you may be able to offer!!!

Community
  • 1
  • 1
Brianne
  • 33
  • 4
  • 1
    It would help to have a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) here with a sample input file. It doesn't have to be your complete input JSON, but it should be similar enough such that solving the problem for the test case will work on your real data. – MrFlick Mar 03 '15 at 21:09
  • Flattening that kind of data to CSV is often a pain in R. I'd go with the replacing `NULL` by `NA`-unlisting-approach. How did you do that or why isn't it working? – lukeA Mar 03 '15 at 22:32
  • Thank you for your help! For the reproducible example, I was really struggling with how to provide that, until I read the link you provided! I think I can use dput to provide the head of the JSON object. I will update my original post with that. – Brianne Mar 04 '15 at 17:39
  • Regarding my attempt to replace NULL with NA, I will update my original post with some output to illustrate more clearly why it's not working. – Brianne Mar 04 '15 at 17:52

1 Answers1

1

You might want to try and adapt this approach:

f <- function(x)
  if(is.list(x)) {
    unlist(lapply(x, f))
  } else {
    x[which(is.null(x))] <- NA
    paste(x, collapse = ",")
  }
df <- as.data.frame(do.call(cbind, lapply(friListings, f)))
write.table(df, tf <- tempfile(fileext = "csv"))
df <- read.table(tf)
str(df)
# 'data.frame':  4 obs. of  21 variables:
# $ listingId                   : int  1138579989 1138969061 1138958138 1139003985
# $ sectionId                   : int  1552295 1552172 1552220 1552289
# $ row                         : int  16 6 22 26
# $ quantity                    : int  1 2 4 1
# $ sellerSectionName           : Factor w/ 4 levels "353","Club 303 - Obstructed/No View",..: 4 2 3 1
# $ sectionName                 : Factor w/ 4 levels "Club 303 - Obstructed/No View",..: 4 1 2 3
# $ seatNumbers                 : Factor w/ 4 levels "13,14,15,16",..: 2 3 1 4
# $ zoneId                      : int  232917 232909 232914 232917
# $ zoneName                    : Factor w/ 3 levels "Club","Middle",..: 3 1 2 3
# $ listingAttributeList        : Factor w/ 4 levels "204,201","204,4369",..: 1 3 4 2
# $ listingAttributeCategoryList: int  1 1 1 1
# $ deliveryTypeList            : int  5 5 5 5
# $ dirtyTicketInd              : logi  FALSE FALSE FALSE FALSE
# $ splitOption                 : int  0 0 1 1
# $ ticketSplit                 : int  1 2 2 1
# $ splitVector                 : Factor w/ 3 levels "1","2","2,4": 1 2 3 1
# $ sellerOwnInd                : int  0 0 0 0
# $ currentPrice.amount         : num  469 475 475 550
# $ currentPrice.currency       : Factor w/ 1 level "USD": 1 1 1 1
# $ faceValue.amount            : logi  NA NA NA NA
# $ faceValue.currency          : logi  NA NA NA NA
lukeA
  • 53,097
  • 5
  • 97
  • 100
  • This worked perfectly! Thank you! I want to try and understand your function f. Is this function calling itself recursively? If x is a list, then it does an lapply on itself? I understand the "else" statement pretty clearly, I think. – Brianne Mar 04 '15 at 20:27
  • Thank you again! So much. I was going to give up. :) – Brianne Mar 04 '15 at 20:38