19

I am trying to turn a nested list structure into a dataframe. The list looks similar to the following (it is serialized data from parsed JSON read in using the httr package).

  myList <- list(object1 = list(w=1, x=list(y=0.1, z="cat")), object2 = list(w=NULL, x=list(z="dog")))

EDIT: my original example data was too simple. The actual data are ragged, meaning that not all variables exist for every object, and some of the list elements are NULL. I edited the data to reflect this.

unlist(myList) does a great job of recursively flattening the list, and I can then use lapply to flatten all the objects nicely.

  flatList <- lapply(myList, FUN= function(object) {return(as.data.frame(rbind(unlist(object))))}) 

And finally, I can button it up using plyr::rbind.fill

  myDF <- do.call(plyr::rbind.fill, flatList)
  str(myDF)

  #'data.frame':    2 obs. of  3 variables:
  #$ w  : Factor w/ 2 levels "1","2": 1 2
  #$ x.y: Factor w/ 2 levels "0.1","0.2": 1 2
  #$ x.z: Factor w/ 2 levels "cat","dog": 1 2

The problem is that w and x.y are now being interpreted as character vectors, which by default get parsed as factors in the dataframe. I believe that unlist() is the culprit, but I can't figure out another way to recursively flatten the list structure. A workaround would be to post-process the dataframe, and assign data types then. What is the best way to determine if a vector is a valid numeric or integer vector?

Andrew Barr
  • 3,589
  • 4
  • 18
  • 28
  • 1
    You could create "flatList" using `lapply(myList, as.data.frame)` since there is a method `as.data.frame.list`; `unlist` turns your "list" to an atomic vector and everything is coerced to "character" – alexis_laz Jun 09 '14 at 21:28
  • YES! I like this the best. `plyr::rbind.fill(lapply(myList, as.data.frame))` is pretty elegant! If you make an answer I will accept it. – Andrew Barr Jun 09 '14 at 21:31

6 Answers6

19

As discussed here, checking if as.numeric returns NA values is a simple approach to checking if a character string contains numeric data. Now you can do something like:

myDF2 <- lapply(myDF, function(col) {
  if (suppressWarnings(all(!is.na(as.numeric(as.character(col)))))) {
    as.numeric(as.character(col))
  } else {
    col
  }
})
str(myDF2)
# List of 3
#  $ w  : num [1:2] 1 2
#  $ x.y: num [1:2] 0.1 0.2
#  $ x.z: Factor w/ 2 levels "cat","dog": 1 2
Community
  • 1
  • 1
josliber
  • 43,891
  • 12
  • 98
  • 133
  • Note: I ended up using this solution. The only think I added as to turn it back into a dataframe using `as.data.frame(myDF2)` – Andrew Barr Jun 11 '14 at 13:47
  • 2
    make that long logical expression a tad simpler by writing suppressWarnings(any(is.na(as.numeric(as.character(col)))))) – Robert Kubinec Apr 04 '17 at 18:14
13

When NAs are included @josliber's original function didn't work (though it answered the question well for the sample data). @Amy M's function should work but requires loading Hmisc package.

What about something like this:

can.be.numeric <- function(x) {
    stopifnot(is.atomic(x) || is.list(x)) # check if x is a vector
    numNAs <- sum(is.na(x))
    numNAs_new <- suppressWarnings(sum(is.na(as.numeric(x))))
    return(numNAs_new == numNAs)
}

It counts NAs in input vector x and NAs in the output of as.numeric(x) and returns TRUE if the vector can be "safely" converted to numeric (i.e. without adding any additional NA values).

UPDATE: Request to show how to use the function. You want to call this function on each column and only convert columns that can be numeric.

myDF2 <- lapply(myDF, function(col) {
  if (can.be.numeric(col)) {
    as.numeric(col)
  } else {
    col
  }
})
str(as.data.frame(myDF2))
# 'data.frame': 2 obs. of  3 variables:
#  $ w  : num  1 NA
#  $ x.y: num  0.1 NA
#  $ x.z: chr  "cat" "dog"
Stefan Avey
  • 1,148
  • 12
  • 23
  • Is it possible to show how this would be used to identiy and convert numeric columns on a data.frame? It looks like exactly what I need, but I'm not sure how to apply it – Warren Spencer Mar 18 '22 at 14:10
  • 1
    Updated to show example usage using the same code as the accepted answer but replacing with the function. – Stefan Avey Mar 22 '22 at 14:23
1

You can use plyr::ldply:

ldply(myList,.fun=function(x)data.frame(x))

      .id w x.y x.z
1 object1 1 0.1 cat
2 object2 2 0.2 dog
agstudy
  • 119,832
  • 17
  • 199
  • 261
1

I don't see any advantage of plyr::ldply over regular base R methods:

 do.call(rbind, lapply(myList, data.frame) )
#-------------

        w x.y x.z
object1 1 0.1 cat
object2 2 0.2 dog

The trouble was arising because of a misguided attempt to "flatten" the data without consideration for it's intrinsic structure.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • turns out my example data were too simple. The actual data are ragged, meaning not all variables exist for each object. And some of the list elements are NULL, which produces errors in `data.frame()` I edited the question to contain better example data. – Andrew Barr Jun 09 '14 at 21:45
  • @AndrewBarr : In this case, see something like `lapply(myList, function(x) as.data.frame(unlist(x, FALSE)))` – alexis_laz Jun 09 '14 at 22:00
0

If you just want to convert all-numeric vectors that have been erroneously classed as character when they were read in, you can also use the function all.is.numeric from the Hmisc package:

myDF2 <- lapply(myDF, Hmisc::all.is.numeric, what = "vector", extras = NA)

Choosing what = "vector" will convert the vector to numeric if it only contains numbers. NAs or other types of missing values will prevent conversion unless they are specified in the extras argument as above.

Note however that if applied to a whole data.frame containing Date or POSIXct vectors, these will also be converted to numeric. To prevent this you can wrap it in a function as below:

catchNumeric <- function(dtcol) {
  require(Hmisc)
  if (is.character(dtcol)) {
    dtcol1 = all.is.numeric(dtcol, what = "vector", extras = NA)
  } else {
    dtcol1 = dtcol
  }
  return(dtcol1)
}

Then apply to your data.frame:

myDF2 <- lapply(myDF, catchNumeric)
Amy M
  • 967
  • 1
  • 9
  • 19
0

If you have a list or a vector with strings and you want to convert only the numbers to numeric, a possible solution is:

catchNumeric <- function(mylist) {
  newlist <- suppressWarnings(as.numeric(mylist))
  mylist <- as.list(mylist)
  mylist[!is.na(newlist)] <- newlist[!is.na(newlist)]
  mylist
}

> catchNumeric(c("123", "c12", "abc", "123.12"))
[[1]]
[1] 123

[[2]]
[1] "c12"

[[3]]
[1] "abc"

[[4]]
[1] 123.12

> catchNumeric(list("123", "c12", "abc", "123.12"))
[[1]]
[1] 123

[[2]]
[1] "c12"

[[3]]
[1] "abc"

[[4]]
[1] 123.12
Adriano Rivolli
  • 2,048
  • 1
  • 13
  • 13