2

I have a sample data:

SampleID  a      b     d     f       ca      k     l    cb
1         0.1    2     1     2       7       1     4    3
2         0.2    3     2     3       4       2     5    5
3         0.5    4     3     6       1       3     9    2

I need to find row-wise sum of columns which have something common in names, e.g. row-wise sum(a, ca) or row-wise sum(b,cb). The problem is that i have large data.frame and ideally i would be able to write what is common in column header, so that code would pick only those columns to sum

Thank you beforehand for any assistance.

Try
  • 111
  • 2
  • 8

2 Answers2

2

We can select the columns that have 'a' with grep, subset the columns and do rowSums and the same with 'b' columns.

 rowSums(df1[grep('a', names(df1)[-1])+1])
 rowSums(df1[grep('b', names(df1)[-1])+1])
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Could it be modified so that it returns matrix, data.frame and position of columns is not `+1` all the time. So, could be the code a bit more general. – Try Feb 29 '16 at 07:30
  • I clicked! Thank you! But i don't have much reputation yet, it to be appeared. But i really did! – Try Feb 29 '16 at 07:30
  • Thank you. But could you please explain me a bit more.How can i modify your code to sum rows of columns which are for example 7 columns away from each other? – Try Feb 29 '16 at 07:38
  • @OlgaAnufrieva The `grep` gives the column index of the columns that have the same pattern in the column name. I added `1` to the grep output because I was `grep`ping on the subset of dataset that doesn't include the first column. So, if I understand your comment, it should work. – akrun Feb 29 '16 at 07:41
  • Sorry, i modified sample data frame. How in this case would `a` and `ca` be summed? Thank you! – Try Feb 29 '16 at 07:45
  • @OlgaAnufrieva It should be the same i.e. `rowSums(df2[grep('a', names(df2)[-1])+1])# [1] 7.1 4.2 1.5` You can check the subsetted columns `df2[grep('a', names(df2)[-1])+1]` – akrun Feb 29 '16 at 07:49
  • Sorry,,but in real data it does not work. Actual columns with similar names are 6th and 13th for example. But your code sum something else now.. – Try Feb 29 '16 at 07:53
  • @OlgaAnufrieva Perhaps in your real data, the patterns are different. The regex pattern works for the names you showed in the example. – akrun Feb 29 '16 at 07:55
  • Could you suggest something if column names look like this `XXL:VLDL.CE` and `XL.VLDL.CE`. Those for example i need to sum – Try Feb 29 '16 at 07:57
  • If `VLDL.CE` is the common element, then `rowSums(df2[grep("VLDL\\.CE$", names(df2))])` should work – akrun Feb 29 '16 at 07:59
  • and what if only CE is common, could i use `CE$` :) ? – Try Feb 29 '16 at 08:00
  • @OlgaAnufrieva Yes, you are right, `grep("CE$",..` – akrun Feb 29 '16 at 08:01
  • 1
    Thanks a lot for the help! – Try Feb 29 '16 at 08:02
  • A little question to clarify...sorry for disturbing. what if in my `DF` i have more than 2 columns with same part of the name. Does it sum all of those columns? – Try Feb 29 '16 at 08:09
  • @OlgaAnufrieva The `grep` will get all the columns that match the pattern. So, it can be 3, 4 or any number – akrun Feb 29 '16 at 09:10
0

If you want the output as a data frame, try using dplyr

# Recreating your sample data
df <- data.frame(SampleID = c(1, 2, 3),
             a = c(0.1, 0.2, 0.5),
             b = c(2, 3, 4),
             d = c(1, 2, 3),
             f = c(2, 3, 6),
             ca = c(7, 4, 1),
             k = c(1, 2, 3),
             l = c(4, 5, 9),
             cb = c(3, 5, 2)) 

Process the data

# load dplyr
library(dplyr)

# Sum across columns 'a' and 'ca' (sum(a, ca))
df2 <- df %>%
    select(contains('a'), -SampleID) %>% # 'select' function to choose the columns you want 
    mutate(row_sum = rowSums(.)) # 'mutate' function to create a new column 'row_sum' with the sum of the selected columns. You can drop the selected columns by using 'transmute' instead.

df2 # have a look

    a ca row_sum
1 0.1  7     7.1
2 0.2  4     4.2
3 0.5  1     1.5
Peter K
  • 706
  • 5
  • 8