32

While data frame columns must have the same number rows, is there any way to create a data frame of unequal lengths. I'm not interested in saving them as separate elements of a list because I often have to to email people this info as a csv file, and this is easiest as a data frame.

x = c(rep("one",2))
y = c(rep("two",10))
z = c(rep("three",5))
cbind(x,y,z)

In the above code, the cbind() function just recycles the shorter columns so that they all have 10 elements in each column. How can I alter it just so that lengths are 2, 10, and 5.

I've done this in the past by doing the following, but it's inefficient.

  df = data.frame(one=c(rep("one",2),rep("",8)), 
           two=c(rep("two",10)), three=c(rep("three",5), rep("",5))) 
zx8754
  • 52,746
  • 12
  • 114
  • 209
ATMathew
  • 12,566
  • 26
  • 69
  • 76
  • 1
    This issue has [arisen](http://stackoverflow.com/questions/5531471/combining-unequal-columns-in-r) [before](http://stackoverflow.com/questions/3365885/combining-vectors-of-unequal-length-into-a-data-frame). The latter is probably not quite a duplicate, but the former is pretty close. – joran Aug 25 '11 at 20:12
  • 1
    yes. in particular, my answer is nearly identical to two answers given in the former. @Owen's "subversive" answer is novel, and clever (if dangerous). – Ben Bolker Aug 25 '11 at 20:21
  • 2
    This question is like asking how do I store an integer that represents 2/3. – hadley Aug 26 '11 at 11:49
  • You could also use `dput` to store data in an ascii (R-only) format. – Owen Sep 12 '11 at 08:17

6 Answers6

31

Sorry this isn't exactly what you asked, but I think there may be another way to get what you want.

First, if the vectors are different lengths, the data isn't really tabular, is it? How about just save it to different CSV files? You might also try ascii formats that allow storing multiple objects (json, XML).

If you feel the data really is tabular, you could pad on NAs:

> x = 1:5
> y = 1:12
> max.len = max(length(x), length(y))
> x = c(x, rep(NA, max.len - length(x)))
> y = c(y, rep(NA, max.len - length(y)))
> x
 [1]  1  2  3  4  5 NA NA NA NA NA NA NA
> y
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

If you absolutely must make a data.frame with unequal columns you could subvert the check, at your own peril:

> x = 1:5
> y = 1:12
> df = list(x=x, y=y)
> attributes(df) = list(names = names(df),
    row.names=1:max(length(x), length(y)), class='data.frame')
> df
      x  y
1     1  1
2     2  2
3     3  3
4     4  4
5     5  5
6  <NA>  6
7  <NA>  7
 [ reached getOption("max.print") -- omitted 5 rows ]]
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs
Owen
  • 38,836
  • 14
  • 95
  • 125
13

Another approach to the padding:

na.pad <- function(x,len){
    x[1:len]
}

makePaddedDataFrame <- function(l,...){
    maxlen <- max(sapply(l,length))
    data.frame(lapply(l,na.pad,len=maxlen),...)
}

x = c(rep("one",2))
y = c(rep("two",10))
z = c(rep("three",5))

makePaddedDataFrame(list(x=x,y=y,z=z))

The na.pad() function exploits the fact that R will automatically pad a vector with NAs if you try to index non-existent elements.

makePaddedDataFrame() just finds the longest one and pads the rest up to a matching length.

goodside
  • 4,429
  • 2
  • 22
  • 32
Peter M
  • 844
  • 5
  • 14
7

To amplify @goodside's answer, you can do something like

L <- list(x,y,z)
cfun <- function(L) {
  pad.na <- function(x,len) {
   c(x,rep(NA,len-length(x)))
  }
  maxlen <- max(sapply(L,length))
  do.call(data.frame,lapply(L,pad.na,len=maxlen))
}
cfun(L)
Arthur Yip
  • 5,810
  • 2
  • 31
  • 50
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
4

What you need is to pad NAs to the end of the vector to match the length of the longest vector, so you can do:

l <- tibble::lst(x, y, z)
data.frame(lapply(l, `length<-`, max(lengths(l))))

      x   y     z
1   one two three
2   one two three
3  <NA> two three
4  <NA> two three
5  <NA> two three
6  <NA> two  <NA>
7  <NA> two  <NA>
8  <NA> two  <NA>
9  <NA> two  <NA>
10 <NA> two  <NA>
Maël
  • 45,206
  • 3
  • 29
  • 67
0

We can create a data frame containing columns of unequal lengths by padding the columns with empty character "". The following code can be used to create a data frame with unequal lengths

The code first finds the maximum column length of a list object, l Next the columns are padded with "". This will cause each column of the list to have the same number of elements. The list is then converted to a data frame.

# The list column names
cols <- names(l)

# The maximum column length
max_len <- 0
for (col in cols){
    if (length(l[[col]]) > max_len)
        max_len <- length(l[[col]])
}

# Each column is padded
for (col in cols){
    l[[col]] <- c(l[[col]], rep("", max_len - length(l[[col]])))
}

# The list is converted to data frame
df <- as.data.frame(l)
Nadir Latif
  • 3,690
  • 1
  • 15
  • 24
-3

Similar problem:

 coin <- c("Head", "Tail")
toss <- sample(coin, 50, replace=TRUE)

categorize <- function(x,len){
  count_heads <- 0
  count_tails <- 0
  tails <- as.character()
  heads <- as.character()
  for(i in 1:len){
    if(x[i] == "Head"){
      heads <- c(heads,x[i])
      count_heads <- count_heads + 1
    }else {
      tails <- c(tails,x[i])
      count_tails <- count_tails + 1
    }
  }
  if(count_heads > count_tails){
    head <- heads
    tail <- c(tails, rep(NA, (count_heads-count_tails)))
  } else {
    head <- c(heads, rep(NA,(count_tails-count_heads)))
    tail <- tails
  }
  data.frame(cbind("Heads"=head, "Tails"=tail))
}

categorize(toss,50)

Output: After the toss of the coin there will be 31 Head and 19 Tail. Then the rest of the tail will be filled with NA in order to make a data frame.

ttb
  • 1
  • 2
    Growing things in a loop is a bad idea in R; the usual reference is www.burns-stat.com/documents/books/the-r-inferno/ You can just do `heads = sum(x == "Head")`, right? Really, I guess `rbinom` would make more sense than `sample` in any case. – Frank Sep 19 '17 at 20:00