3

I need to combine sets of variables ending in specific characters. The variables are named in the following way (data=df):

id   A_1   A_2   A_3   B_1   B_2   B_3   C_1   C_2   C_3
1    7     2     1     6     7     4     3     9     0
2    8     5     2     9     7     2     1     6     1
3    5     4     0     8     6     4     1     7     0

Now let's say "combine" means that I need the rowMeans of every set of variables ending in _1, in _2 and so forth.

So I could do just this:

rowMeans(df[,c("A_1","B_1","C_1")])
rowMeans(df[,c("A_2","B_2","C_2")])
rowMeans(df[,c("A_3","B_3","C_3")])

The issue is that I have lots of these variable endings and I would essentially end up with one line of code per ending. Hence, I was wondering if there is a smarter way of doing the same thing using a for-loop.

This for-loop would need to contain i(1:n) for the ending and a stable name pattern which is "A_", "B_" and "C_" and then run rowMeans(). But since I'm not an expert on writing loops, I have no idea how I would do this in practice.

Dr. Fabian Habersack
  • 1,111
  • 12
  • 30
  • If you define an array `a = c("A_", "B_", "C_")` you can loop over all your occurences and use `paste` to concatenate string and number like so `paste(a[i], as.character(i))` , see https://stackoverflow.com/questions/7201341/how-can-two-strings-be-concatenated – abcalphabet Sep 19 '18 at 09:14

3 Answers3

4

We could generate column names using paste0, select them take rowMeans of them and add it to a new column using lapply.

n <- 3
df[paste0(1:n, "_mean")] <- lapply(paste0("_", 1:n, "$"), function(x) {
 rowMeans(df[grepl(x, names(df))])
})

df

#  id A_1 A_2 A_3 B_1 B_2 B_3 C_1 C_2 C_3   1_mean   2_mean   3_mean
#1  1   7   2   1   6   7   4   3   9   0 5.333333 6.000000 1.666667
#2  2   8   5   2   9   7   2   1   6   1 6.000000 6.000000 1.666667
#3  3   5   4   0   8   6   4   1   7   0 4.666667 5.666667 1.333333
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
3

You can do that with the tidyverse:

df %>% gather(k,v,-id) %>% 
  mutate(g=substring(k,3)) %>%
  group_by(id,g) %>% summarise(n=mean(v)) %>%
  spread(g,n)
## A tibble: 3 x 4
## Groups:   id [3]
#     id   `1`   `2`   `3`
#  <int> <dbl> <dbl> <dbl>
#1     1  5.33  6     1.67
#2     2  6     6     1.67
#3     3  4.67  5.67  1.33
Nicolas2
  • 2,170
  • 1
  • 6
  • 15
1

You can use paste0 with lapply like this:

n_numbers <- 1:3
n_letters <- 1:3 # change here to extend

col_list <- lapply(n_numbers, function(x) paste0(LETTERS[n_letters], "_", x)) #generates the names

dat_list <- lapply(col_list, function(x) dat[x]) # applies subsetting to dat

lapply(dat_list, rowMeans) # rowMeans

# [[1]]
# [1] 5.333333 6.000000 4.666667
# 
# [[2]]
# [1] 6.000000 6.000000 5.666667
# 
# [[3]]
# [1] 1.666667 1.666667 1.333333

Data used:

tt <- "id   A_1   A_2   A_3   B_1   B_2   B_3   C_1   C_2   C_3
1    7     2     1     6     7     4     3     9     0
2    8     5     2     9     7     2     1     6     1
3    5     4     0     8     6     4     1     7     0"

dat <- read.table(text = tt, header = T)
RLave
  • 8,144
  • 3
  • 21
  • 37
  • 2
    Nice one, but what would you suggest in the case @RonakShah mentioned? Because the thing is that I have a limited number of variable prefixes (A_, B_, ...) but an endless number of suffixes (1,2,3...100). – Dr. Fabian Habersack Sep 19 '18 at 09:30
  • @Fabian Habersack Updated, now works with as many indexes as you want. – RLave Sep 19 '18 at 09:42