sum across columns within rows for all columns that start with a specific character string in R

Question

I have a data frame with a set of species IDs in the ID column, and sample IDs as separate columns with the motif CA_**. The data look like this:

ID    <- c('A','B','C')
CA_01 <- c(3,9,54)
CA_56 <- c(2,7,12)
CA_92 <- c(45,4,47)
d<- data.frame(ID,CA_01,CA_56,CA_92)

 ID CA_01 CA_56 CA_92
  A     3     2    45
  B     9     7     4
  C    54    12    47

I want to sum across the columns within each row, and generate a new column, that is the total abundance of each species ID across sample columns (final values 50, 20, 113). Furthermore, There are many other columns in my real data frame. I only want to sum across columns that start with CA_**.

NOTE: this is different than the question asked here, as the asker knows the positions of the columns the asker wants to sum. Imy example I only know that the columns start with the motif, CA_. I don't know the positions. Its also different that the question here, as I specifically ask how to sum across columns based on the grep command.

@PierreLafortune the example you say this is a duplicate of knows exactly which rows the asker wants to sum. I do not in this case, making my solution, and the question, different. can you please unmark duplicate? — colin, Mar 12 '16 at 22:17
[subset data to contain only columns whose names match a condition](http://stackoverflow.com/questions/18587334/subset-data-to-contain-only-columns-whose-names-match-a-condition) — Jota, Mar 12 '16 at 22:20

score 4 · Accepted Answer · answered Mar 12 '16 at 22:09

4

We can use grep to subset the columns having column names that start with CA_ and get the sum of the rows with rowSums.

d$newCol <- rowSums(d[grep('^CA\\_', names(d))])

answered Mar 12 '16 at 22:09

akrun

874,273
37
540
662

sum across columns within rows for all columns that start with a specific character string in R

1 Answers1