2

If I have a data frame as below, with the first row the column names (row names not included here)

   A   B   C   D   E   F   G   H   I
   a   b   c   a   a   b   c   c   c
   1   2   3   4   5   6   7   8   9

How would I be able create a new data frame such that:

   a  b  c
   1  2  3
   4  6  7
   5 NA  8
   NA NA 9

Notice the NA. For empty values.

UPDATE

If d.frame is the dataframe in question:

new.df <- data.frame();
firstrow <- d.frame[,1]
names <- unique(firstrow)
for (n in names) {
   #cbind.fill is part of a package plyr
   new.df <- cbind.fill(new.df, frame[3,which(firstrow == n)])
}
colnames(new.df) <- names;

I think that works well. But it isn't efficient and relies on a third party package. Any suggestions?

user3562276
  • 43
  • 1
  • 5

3 Answers3

2

Here is another solution, based on function cbind.fill from cbind a df with an empty df (cbind.fill?)

cbind.fill<-function(...){
  nm <- list(...) 
  nm<-lapply(nm, as.matrix)
  n <- max(sapply(nm, nrow)) 
  do.call(cbind, lapply(nm, function (x) 
    rbind(x, matrix(, n-nrow(x), ncol(x))))) 
}

df <- read.table(text = "A   B   C   D   E   F   G   H   I
a   b   c   a   a   b   c   c   c
1   2   3   4   5   6   7   8   9", header = T, as.is=T)

df <- as.matrix(df)
do.call(cbind.fill, split(df[2,], df[1,]))

And another one solution

df <- as.matrix(df)
lst <- split(df[2,], df[1,])
m <- max(sapply(lst, length))
result <- sapply(lst, function(x) {length(x) <- m; x})
Community
  • 1
  • 1
DrDom
  • 4,033
  • 1
  • 21
  • 23
  • 2
    -1. Not so much *based on* as directly copied from. Verbatim. Shouldn't you have just directed the OP to that answer instead? – Simon O'Hanlon Jul 15 '14 at 22:06
  • @SimonO'Hanlon, how would the answer with redirecting have looked? Are you expecting something like, "Split the data using `split(df[2, ], df[1, ])` and then use `cbind.fill()` from this answer"? The questions are quite different, and `split` is actually a nice approach to take for this problem. – A5C1D2H2I1M1N2O1R2T1 Jul 16 '14 at 02:06
  • OK, I added another solution, based purely on `split` and `sapply`. Probably this is not the best one and a more elegant approach is existed. Using `split` looks very natural for this task for me. – DrDom Jul 16 '14 at 08:59
0

Couldn't find a simple solution for this, so here's one option using base R as you requested in comments. This solution will work no matter how many columns you have in the original data

temp <- read.table(text = "A   B   C   D   E   F   G   H   I
a   b   c   a   a   b   c   c   c
1   2   3   4   5   6   7   8   9", header = T) # your data

temp <- data.frame(t(temp))
lengths <- table(temp[, 1])
maxval <- max(lengths)
data.frame(do.call(cbind, lapply(levels(temp[, 1]), function(x) c(x, temp[temp[, 1] == x, 2], rep(NA, maxval - lengths[x])))))

##     X1   X2 X3
## 1    a    b  c
## 2    1    2  3
## 3    4    6  7
## 4    5 <NA>  8
## 5 <NA> <NA>  9
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • Awesome, works great. One tiny irrelevant fix I would make just to get rid of the annoying warning message. In the data.frame() function, just add row.names = NULL as an argument. Honestly, the code is functional either way, but I like to see no red :D – user3562276 Jul 16 '14 at 00:32
0

I would transpose the original two-row data.frame, create a "time" variable, use reshape to reorganize the data, and transpose the result.

Like this:

x <- t(mydf)
y <- data.frame(cbind(x, ave(x[, 1], x[, 1], FUN = seq_along)))
t(reshape(y, direction = "wide", idvar = "X1", timevar = "X3"))
#      A   B   C  
# X1   "a" "b" "c"
# X2.1 "1" "2" "3"
# X2.2 "4" "6" "7"
# X2.3 "5" NA  "8"
# X2.4 NA  NA  "9"
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485