2

I have a data frame with a set of species IDs in the ID column, and sample IDs as separate columns with the motif CA_**. The data look like this:

ID    <- c('A','B','C')
CA_01 <- c(3,9,54)
CA_56 <- c(2,7,12)
CA_92 <- c(45,4,47)
d<- data.frame(ID,CA_01,CA_56,CA_92)

 ID CA_01 CA_56 CA_92
  A     3     2    45
  B     9     7     4
  C    54    12    47

I want to sum across the columns within each row, and generate a new column, that is the total abundance of each species ID across sample columns (final values 50, 20, 113). Furthermore, There are many other columns in my real data frame. I only want to sum across columns that start with CA_**.

NOTE: this is different than the question asked here, as the asker knows the positions of the columns the asker wants to sum. Imy example I only know that the columns start with the motif, CA_. I don't know the positions. Its also different that the question here, as I specifically ask how to sum across columns based on the grep command.

Community
  • 1
  • 1
colin
  • 2,606
  • 4
  • 27
  • 57
  • 1
    Have you looked at any tutorials or tried anything yet? – Pierre L Mar 12 '16 at 22:09
  • @PierreLafortune the example you say this is a duplicate of knows exactly which rows the asker wants to sum. I do not in this case, making my solution, and the question, different. can you please unmark duplicate? – colin Mar 12 '16 at 22:17
  • 2
    [subset data to contain only columns whose names match a condition](http://stackoverflow.com/questions/18587334/subset-data-to-contain-only-columns-whose-names-match-a-condition) – Jota Mar 12 '16 at 22:20

1 Answers1

4

We can use grep to subset the columns having column names that start with CA_ and get the sum of the rows with rowSums.

d$newCol <- rowSums(d[grep('^CA\\_', names(d))])
akrun
  • 874,273
  • 37
  • 540
  • 662