R, create a new column in a data frame that applies a function of all the columns with similar names

Question

I have a data frame in which the names of the columns are something like a,b,v1,v2,v3...v100. I want to create a new column that applies a function to only the columns whose names include 'v'.

For example, given this data frame

df<-data.frame(a=rnorm(3),v1=rnorm(3),v2=rnorm(3),v3=rnorm(3))

I want to create a new column in which each element is the sum of the elements of v1, v2 and v3 that are in the same row.

score 6 · Answer 1 · answered Sep 13 '11 at 09:34

6

grep on names to get the column positions, then use rowSums:

rowSums(df[,grep("v",names(df))])

answered Sep 13 '11 at 09:34

James

65,548
14
155
193

1

Use `df[grep("v",names(df))]` to avoid conversion to vector if only one column is selected. Compare `df[,"v1"]` vs `df["v1"]`. – Marek Sep 13 '11 at 13:03

score 3 · Answer 2 · edited May 23 '17 at 12:07

To combine both @James's and @Anatoliy's answers,

apply(df[grepl('^v', names(df))], 1, sum)

I went ahead and anchored the v in the regular expression to the beginning of the string. Other examples haven't done that but it appears that you want all columns that begin with v not the larger set that may have a v in their name. If I am wrong you could just do

apply(df[grepl('v', names(df))], 1, sum)

You should avoid using subset() when programming, as stated in ?subset

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like ‘[’, and in particular the non-standard evaluation of argument ‘subset’ can have unanticipated consequences.

Also, as I learned yesterday from Richie Cotton, when indexing it is better to use grepl than grep.

score 2 · Answer 3 · answered Sep 13 '11 at 09:33

2

That should do:

df$sums<- rowSums(subset(df, select=grepl("v", names(df))))

For a more general approach:

apply(subset(df, select=grepl("v", names(df))), 1, sum)

answered Sep 13 '11 at 09:33

Anatoliy

1,350
9
9

R, create a new column in a data frame that applies a function of all the columns with similar names

3 Answers3

Related