Sum and replace columns with same name R for a data frame containing different classes

Question

I have a data frame containing multiple classes, I would like to sum those columns that have the same name and are numeric, and replace the old columns with the new sum, does anyone know a way to do this?

i.e I have a data frame like:

col1 col2  col3 col3 
char factor int int

I would like to produce

col1  col2  col3 
char factor 2int

I have previously used:

data <- as.data.frame(do.call(cbind, by(t(data),INDICES=names(data),FUN=colSums)))

However this was on a dataframe that only had numeric variables.

There are other examples on the internet but not meeting the conditions of: replacement, preserving the rest of the frame, and of being on a frame with multiple classes

Similar question: how do I search for columns with same name, add the column values and replace these columns with same name by their sum? Using R

Do you have cases where two `col2` `col2` are not of the same class? — akrun, Jan 08 '15 at 12:10
@akrun No all the columns with the same name are the same classes (numeric) — user124123, Jan 08 '15 at 12:12
Isn't this just `data$new_col <- rowSums(data[sapply(data, is.integer)])` and then remove these columns? — David Arenburg, Jan 08 '15 at 12:21

akrun · Accepted Answer · 2015-01-08T13:05:41.900

Try

dat1 <- dat #to keep a copy of the original dataset 
indx <- sapply(dat, is.numeric)#check which columns are numeric
nm1 <- which(indx)#get the numeric index of the column
indx2 <- duplicated(names(nm1))#check which among the
# integer columns are duplicated
#use `Map` after splitting the "nm1" with its "names", do the `rowSums`
dat[ nm1[!indx2]] <- Map(function(x,y) rowSums(x[y]), list(dat),
                                       split(nm1, names(nm1)))

 dat[ -nm1[indx2]]

Update

Or to make it more efficient, only take the "duplicated" and "numeric" columns while leaving the others intact. Create an "index" (indx2) of columns that are duplicated. Subset the "nm1" based on the "indx2" and then do rowSums as described above. Finally, remove the unwanted columns (duplicated ones) by using the "indx3"

 indx2 <- duplicated(names(nm1))|duplicated(names(nm1),fromLast=TRUE)
 nm2 <- nm1[indx2]
 indx3 <- duplicated(names(nm2))
 dat[nm2[!indx3]] <- Map(function(x,y) rowSums(x[y]), 
                list(dat),split(nm2, names(nm2)))
 datN <- dat[ -nm2[indx3]]
 datN
 #    col1 col2 col3 col4 col5
 #1    16   23    2   10   10
 #2    10   18   12    8   18
 #3    21   23   15    6   10
 #4    14   37    3    5   15
 #5    29   39    5    1   11
 #6    26   31   14    2   20
 #7    25   31    2    8   10
 #8    36   31   12    8    6
 #9    32   26   13    6    4
 #10   16   38    1    7    3

Checking the results

 rowSums(dat1[names(dat1) %in% 'col1'])
 #[1] 16 10 21 14 29 26 25 36 32 16
 rowSums(dat1[names(dat1) %in% 'col2'])
 #[1] 23 18 23 37 39 31 31 31 26 38

data

dat <- structure(list(col1 = c(6L, 5L, 15L, 11L, 14L, 19L, 6L, 16L, 
17L, 6L), col2 = c(13L, 8L, 14L, 14L, 7L, 19L, 4L, 1L, 11L, 3L
), col3 = structure(c(2L, 5L, 8L, 3L, 4L, 7L, 2L, 5L, 6L, 1L), .Label = c("1", 
"2", "3", "5", "12", "13", "14", "15"), class = "factor"), col2 = c(7L, 
5L, 8L, 3L, 19L, 5L, 15L, 13L, 14L, 20L), col4 = structure(c(7L, 
6L, 4L, 3L, 1L, 2L, 6L, 6L, 4L, 5L), .Label = c("1", "2", "5", 
"6", "7", "8", "10"), class = "factor"), col5 = c(10L, 18L, 10L, 
15L, 11L, 20L, 10L, 6L, 4L, 3L), col1 = c(10L, 5L, 6L, 3L, 15L, 
7L, 19L, 20L, 15L, 10L), col2 = c(3L, 5L, 1L, 20L, 13L, 7L, 12L, 
17L, 1L, 15L)), .Names = c("col1", "col2", "col3", "col2", "col4", 
"col5", "col1", "col2"), row.names = c(NA, -10L), class = "data.frame")

Thanks for the solution very nice code! Do you know if it's possible to pass the reorder function to rowSums the way you have written it? — user124123, Jan 08 '15 at 13:16
Is it possible to modify your answer to preserve the row ordering? I made a mistake, I meant pass the 'reorder' argument to the 'rowSums' function. — user124123, Jan 08 '15 at 15:44
@user1987097 In the example I created, the row order is preserved. Not sure why it is not preserved — akrun, Jan 08 '15 at 15:58
@user1987097 If you look at the code, I am not doing anything to change the row order. — akrun, Jan 08 '15 at 16:08
The reorder argument to the rowsums function defaults to TRUE, if you don't supply it with reorder = FALSE it will do it automatically. This is problematic for datasets in which ordering is important. — user124123, Jan 08 '15 at 16:47
@user1987097 I couldn't find any `reorder` argument within `rowSums` From the help page, `rowSums (x, na.rm = FALSE, dims = 1)` — akrun, Jan 08 '15 at 16:48
If that is the case, another option would be `Map(function(x,y) do.call(`+`, x[y]` (not tested) — akrun, Jan 08 '15 at 16:49
Apologies I was looking at the "rowsum" function not "rowSums" — user124123, Jan 08 '15 at 17:50
@user1987097 It's okay. `rowSums` will be pretty fast. If this answers your question... — akrun, Jan 08 '15 at 17:52

Sum and replace columns with same name R for a data frame containing different classes

1 Answers1

Update

data

Linked