2

I have 5 lists that need to be the same length as the lists will be combined into a dataframe. One of them may not be the same length as the other 4 so what I currently have is an if statement that checks the length against the length of one of the other lists and then...

1) I create a temporary list using rep( NA, length ) where length is the extra elements I need to add to extend the list

2) I use the concat function c() to combine the list that needs extending with the list with the NAs.

x <- as.numeric( list )

if( length( list ) < length( main ))
{
    temp <- rep( NA, length( main ) - length( list ))
    list <- c( list, temp )
}

List 1 - NA NA

List 2 - 32 53 45

Merged List - 32 53 45 NA NA

The problem with this is that I then get a ton of NAs introduced by coercion after the dataframe is created.

Is there a better way of handling this? I assume it has to do with the fact that the main list is numeric. I tried doing the same with 0 instead of NA but that failed for some reason. What I use to extend the length does not matter. I just need it to not be a number other than 0.

cpd1
  • 777
  • 11
  • 31
  • 2
    If you're just trying to extend the length, look at `?"length<-"` It pads atomic vectors with NA and lists with NULL. Example `x <- 1:3; length(x) <- 5; x` – Rich Scriven Jan 17 '15 at 02:42
  • 2
    Please post some code and give us an example of the desired result, what you tried, and what's not working. – Ista Jan 17 '15 at 02:45
  • @RichardScriven - I'm trying that now as well. Takes a bit of time to process but should know soon what happens – cpd1 Jan 17 '15 at 02:56
  • @RichardScriven - Still the same thing. I still get a dataframe and I think it's complete. I just don't know for sure if I should ignore the warning – cpd1 Jan 17 '15 at 02:57
  • @AndyD I'm afraid that code doesn't really help. Please read http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example, it will give you a better idea of how you can ask your question in a way that makes it easier for people to help you. – Ista Jan 17 '15 at 03:02
  • @AndyD Is it critical to replace by NA, or you can replace by NULL as well? – Marat Talipov Jan 17 '15 at 03:20
  • @Ista - I believe I've given enough including what I expect. I cannot post all of the code nor do I really see why any more is necessary. – cpd1 Jan 17 '15 at 03:23
  • @MaratTalipov - Hi Marat. I really just want to extend the length regardless what of what is used NA, 0, NULL. This is one of 5 lists I will combine into a dataframe using rbindlist after the processing is done. Because the length is different with this one list the merge into the dataframe fails. – cpd1 Jan 17 '15 at 03:24
  • @AndyD this piece of information definitely deserves to be mentioned in the question! – Marat Talipov Jan 17 '15 at 03:30
  • I have made changes to what I was asking as I realized the first line of code was incorrect. I also had mentioned what I want. I never said my code was the proper way of handling it. I only added it after Ista asked for it. Regardless, I do get a dataframe. I just get warnings for every instance where the if statement is True. About 50 out of 200,000 – cpd1 Jan 17 '15 at 03:37

2 Answers2

6

I will assume that you start with several lists like that:

n=as.list(1:2)
a=as.list(letters[1:3])
A=as.list(LETTERS[1:4])

First, I'd suggest to combine them into a list of lists:

z <- list(n,a,A)

so you can find the length of the longest sub-lists:

max.length <- max(sapply(z,length))

and use length<- to fill the missing elements of the shorter sub-lists with NULL values:

# z2 <-  lapply(z,function(k) {length(k) <- max.length; return(k)}) # Original version
# z2 <- lapply(z, "length<-", max.length) # More elegant way

z2 <- lapply(lapply(z, unlist), "length<-", max.length) # Even better because it makes sure that the resulting data frame will consists of atomic vectors

The resulting list can be easily transformed into data.frame:

df <- as.data.frame(do.call(rbind,z2))
Marat Talipov
  • 13,064
  • 5
  • 34
  • 53
  • 2
    You can even avoid an anonymous function with `z2 <- lapply(z, "length<-", max.length)` and it returns the adjusted list. I like that trick – Rich Scriven Jan 17 '15 at 03:43
  • 1
    One more thought. Those columns aren't atomic vectors and might cause problems later. They're lists. So to get atomics with NAs you can do `lapply(lapply(z, unlist), "length<-", max.length)` – Rich Scriven Jan 17 '15 at 03:55
  • @RichardScriven that's a nice trick! I thought that besides the syntactic sugar, length(z) <- max.length is exactly the same thing `length<-`(z,max.length). I've never realized that the former one *modifies* the list, whereas the latter one returns a modified copy of the list. Good to know! – Marat Talipov Jan 17 '15 at 03:57
1

Another option using stringi would be ("z" from @Marat Talipov's post). If you want to get the result as showed in "df",

library(stringi)
as.data.frame(stri_list2matrix(lapply(z, as.character), byrow=TRUE))
#  V1 V2   V3   V4
#1  1  2 <NA> <NA>
#2  a  b    c <NA>
#3  A  B    C    D

NOTE: Now, the columns are all "factors" or "characters" (if we specify stringsAsFactors=FALSE). As @Richard Scriven mentioned in the comments, this would make more sense to have the "rows" as "columns". The above method is good when you have all 'numeric' or 'character' lists.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I think someone should tell OP that columns would be better than rows since now they've mixed numerics and characters in the columns. I was going to but I've badgered everyone enough already :) – Rich Scriven Jan 17 '15 at 04:24
  • I'm not sure what you guys mean by columns instead of rows. I'm a bit of novice in R if it's not already clear :) – cpd1 Jan 18 '15 at 13:42
  • @akrun - So in your line it will make sure the lengths are all the same? I'm just a little confused how it ends up with that since I don't see comparisons lengths etc. – cpd1 Jan 18 '15 at 13:43
  • @AndyD It was based on the example posted by Marat Talipov. If yoe u look at the column `V1`, there are numbers (`1`) and characters(`a`, `A`), which would essentially make the column character/factor. Suppose, if I don't specify, `byrow=TRUE`, this will be the transpose and the columns can be changed to their respective classes. – akrun Jan 18 '15 at 13:45
  • @AndyD As I mentioned in the post, this method is useful when all the list elements are `numeric` as the output will be a matrix and matrix can hold only a single class. So, if there is any character element, the whole matrix will be transformed to character class. Regarding the lengths, yes, it will make sure the lengths will be the same – akrun Jan 18 '15 at 13:47