Group columns with the same name in R

Question

If I have a data frame as below, with the first row the column names (row names not included here)

   A   B   C   D   E   F   G   H   I
   a   b   c   a   a   b   c   c   c
   1   2   3   4   5   6   7   8   9

How would I be able create a new data frame such that:

Notice the NA. For empty values.

UPDATE

If d.frame is the dataframe in question:

new.df <- data.frame();
firstrow <- d.frame[,1]
names <- unique(firstrow)
for (n in names) {
   #cbind.fill is part of a package plyr
   new.df <- cbind.fill(new.df, frame[3,which(firstrow == n)])
}
colnames(new.df) <- names;

I think that works well. But it isn't efficient and relies on a third party package. Any suggestions?

Is the data structure here two rows and columns A to I? I want to make sure I understand it. — TARehman, Jul 15 '14 at 19:55
I think I would probably be able to pull something together, in a few lines of code. Maybe something like this (see the update section) — user3562276, Jul 15 '14 at 20:39
Nevermind all, my code would not work. But I think the merge function can be used. — user3562276, Jul 15 '14 at 20:58

score 2 · Answer 1 · edited May 23 '17 at 12:11

2

Here is another solution, based on function cbind.fill from cbind a df with an empty df (cbind.fill?)

cbind.fill<-function(...){
  nm <- list(...) 
  nm<-lapply(nm, as.matrix)
  n <- max(sapply(nm, nrow)) 
  do.call(cbind, lapply(nm, function (x) 
    rbind(x, matrix(, n-nrow(x), ncol(x))))) 
}

df <- read.table(text = "A   B   C   D   E   F   G   H   I
a   b   c   a   a   b   c   c   c
1   2   3   4   5   6   7   8   9", header = T, as.is=T)

df <- as.matrix(df)
do.call(cbind.fill, split(df[2,], df[1,]))

And another one solution

df <- as.matrix(df)
lst <- split(df[2,], df[1,])
m <- max(sapply(lst, length))
result <- sapply(lst, function(x) {length(x) <- m; x})

edited May 23 '17 at 12:11

Community

1
1

answered Jul 15 '14 at 21:23

DrDom

4,033
1
21
23

2

-1. Not so much *based on* as directly copied from. Verbatim. Shouldn't you have just directed the OP to that answer instead? – Simon O'Hanlon Jul 15 '14 at 22:06
@SimonO'Hanlon, how would the answer with redirecting have looked? Are you expecting something like, "Split the data using `split(df[2, ], df[1, ])` and then use `cbind.fill()` from this answer"? The questions are quite different, and `split` is actually a nice approach to take for this problem. – A5C1D2H2I1M1N2O1R2T1 Jul 16 '14 at 02:06
OK, I added another solution, based purely on `split` and `sapply`. Probably this is not the best one and a more elegant approach is existed. Using `split` looks very natural for this task for me. – DrDom Jul 16 '14 at 08:59

David Arenburg · Accepted Answer · 2014-07-15T21:13:45.710

0

Couldn't find a simple solution for this, so here's one option using base R as you requested in comments. This solution will work no matter how many columns you have in the original data

temp <- read.table(text = "A   B   C   D   E   F   G   H   I
a   b   c   a   a   b   c   c   c
1   2   3   4   5   6   7   8   9", header = T) # your data

temp <- data.frame(t(temp))
lengths <- table(temp[, 1])
maxval <- max(lengths)
data.frame(do.call(cbind, lapply(levels(temp[, 1]), function(x) c(x, temp[temp[, 1] == x, 2], rep(NA, maxval - lengths[x])))))

##     X1   X2 X3
## 1    a    b  c
## 2    1    2  3
## 3    4    6  7
## 4    5 <NA>  8
## 5 <NA> <NA>  9

edited Jul 15 '14 at 21:13

answered Jul 15 '14 at 21:07

David Arenburg

91,361
17
137
196

Awesome, works great. One tiny irrelevant fix I would make just to get rid of the annoying warning message. In the data.frame() function, just add row.names = NULL as an argument. Honestly, the code is functional either way, but I like to see no red :D – user3562276 Jul 16 '14 at 00:32

score 0 · Answer 3 · answered Jul 16 '14 at 02:11

I would transpose the original two-row data.frame, create a "time" variable, use reshape to reorganize the data, and transpose the result.

Like this:

x <- t(mydf)
y <- data.frame(cbind(x, ave(x[, 1], x[, 1], FUN = seq_along)))
t(reshape(y, direction = "wide", idvar = "X1", timevar = "X3"))
#      A   B   C  
# X1   "a" "b" "c"
# X2.1 "1" "2" "3"
# X2.2 "4" "6" "7"
# X2.3 "5" NA  "8"
# X2.4 NA  NA  "9"

Group columns with the same name in R

3 Answers3