Merge (make mean) columns with partially matched header name

Question

I have a data which look like:

   AAA_1   AAA_2  AAA_3  BBB_1  BBB_2  BBB_3 CCC
1   1       1      1       2     2      2     1
2   3       1      4       0     0      0     0
3   5       3      0       1     1      1     1

For each row, I want to make a mean for those columns which have a common feature as follow

feature <- c("AAA","BBB","CCC")

the desired output should look like:

   AAA   BBB   CCC
1   1       2   1
2   2.6     0   0
3   2.6     1   1

for each pattern separately I was able to do that:

data <- read.table("data.txt",header=T,row.name=1)
AAA <- as.matrix(rowMeans(data[ , grepl("AAA" , names( data ) ) ])

But I did not know how to do partially match for different patterns in one row

Also tried some other things like :

for (i in 1:length(features)){
feature[i] <- as.matrix(rowMeans(data[ , grepl(feature[i] , names( data ) ) ]))
}

Can you please make your example reproducible? Also, have a read at [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Sotos, Mar 03 '16 at 13:22

jazzurro · Answer 1 · 2016-03-03T15:17:38.720

2

Here is another option for you. Seeing your column pattern, I chose to use gsub() and get the first three letters. Using ind which includes AAA, BBB, and CCC, I used lapply(), subsetted the data for each element of ind, calculated row means, and extracted a column for row mean only. Then, I used bind_cols() and created foo. The last thing was to assign column names to foo.

library(dplyr)

ind <- unique(gsub("_\\d+$", "", names(mydf)))

lapply(ind, function(x){
    select(mydf, contains(x)) %>%
    transmute(out = rowMeans(.))
    }) %>%
bind_cols() %>%
add_rownames -> foo

names(foo) <- ind

#       AAA   BBB   CCC
#     (dbl) (dbl) (dbl)
#1 1.000000     2     1
#2 2.666667     0     0
#3 2.666667     1     1

DATA

mydf <- structure(list(AAA_1 = c(1L, 3L, 5L), AAA_2 = c(1L, 1L, 3L), 
AAA_3 = c(1L, 4L, 0L), BBB_1 = c(2L, 0L, 1L), BBB_2 = c(2L, 
0L, 1L), BBB_3 = c(2L, 0L, 1L), CCC = c(1L, 0L, 1L)), .Names = c("AAA_1", 
"AAA_2", "AAA_3", "BBB_1", "BBB_2", "BBB_3", "CCC"), class = "data.frame", row.names = c(NA, 
-3L))

edited Mar 03 '16 at 15:17

answered Mar 03 '16 at 13:47

jazzurro

23,179
35
66
76

same idea concept, different execution :) – Sotos Mar 03 '16 at 13:54
@Sotos Yeah it seems that we were working in a same way at the same time. :) – jazzurro Mar 03 '16 at 13:56
Thanks. how can I keep rownames? – user6013305 Mar 03 '16 at 14:10
@user6013305 Do you need rownames as a column? – jazzurro Mar 03 '16 at 14:12
Yes. That would be great – user6013305 Mar 03 '16 at 14:34
@user6013305 You can use `add_rownames()` as Thierry suggested. If you add `%>%add_rownames` after bind_cols(), I think you have a column with rownames. – jazzurro Mar 03 '16 at 14:37
@jazzurro I added %>% add_rownames but I get Error: unexpected SPECIAL in "%>%" – user6013305 Mar 03 '16 at 15:08
@user6013305 I revised the code above, which is working on my machine. Can you test that? – jazzurro Mar 03 '16 at 15:18

Sotos · Answer 2 · 2016-03-03T14:50:52.380

Assuming your colnames are always structured as shown in your example, then you can split the names and aggregate.

new_names <-  unlist(strsplit(names(df),"\\_.*"))
colnames(df) <- new_names
#Testing with your data, we need to prevent the loss of dimension by using drop = FALSE  
sapply(unique(new_names), function(i) rowMeans(df[, new_names==i, drop = FALSE]))
#          AAA BBB CCC
#[1,] 1.000000   2   1
#[2,] 2.666667   0   0
#[3,] 2.666667   1   1

Data:

df <- structure(list(AAA_1 = c(1L, 3L, 5L), AAA_2 = c(1L, 1L, 3L), 
AAA_3 = c(1L, 4L, 0L), BBB_1 = c(2L, 0L, 1L), BBB_2 = c(2L, 
0L, 1L), BBB_3 = c(2L, 0L, 1L), CCC = c(1L, 0L, 1L)), .Names = c("AAA_1", 
"AAA_2", "AAA_3", "BBB_1", "BBB_2", "BBB_3", "CCC"), class = "data.frame", row.names = c(NA, 
-3L))

score 1 · Accepted Answer · answered Mar 03 '16 at 13:22

1

library(dplyr)
library(tidyr)
data %>%
  add_rownames() %>%
  gather("variable", "value", -rowname) %>%
  mutate(variable = gsub("_.*$", "", variable)) %>%
  group_by(rowname, variable) %>%
  summarise(mean = mean(value)) %>%
  spread(variable, mean)

answered Mar 03 '16 at 13:22

Thierry

18,049
5
48
66

Merge (make mean) columns with partially matched header name

3 Answers3