populate a new column in a df with new values

Question

I'm looking to populate a new data frame column with a calculated value that is unique to each subgroup of data. Here is my exact code:

 df <- read.csv('data_30_Mar2015.csv')


 df$dCT <- NA

 #FUNCTION
 calc_dCT <- function(sample, DF){

 sample_df <- DF[ which(DF$Sample=='sample'),]
 print (sample_df)
 VIC <- sample_df[ which(sample_df$Reporter=='VIC'),]
 FAM <- sample_df[ which(sample_df$Reporter=='FAM'),]

 VIC_mean<-mean(VIC[,3])
 FAM_mean<-mean(FAM[,3])

 DCT <- FAM_mean - VIC_mean

 for (i in 1:length(sample_df)){
     sample_df[i,4] <- DCT
     }
 DF<-merge(DF, sample_df, all=TRUE)
 }

 #CALLS TO FUNCTION
 calc_dCT('c48', df)
 calc_dCT('m48', df)
 calc_dCT('c72', df)
 calc_dCT('m72', df)

 print (df)

and here is the output:

 calc_dCT('c48', df)
 [1] Sample   Reporter CT       dCT     
 <0 rows> (or 0-length row.names)
 calc_dCT('m48', df)
 [1] Sample   Reporter CT       dCT     
 <0 rows> (or 0-length row.names)
 calc_dCT('c72', df)
 [1] Sample   Reporter CT       dCT     
 <0 rows> (or 0-length row.names)
 calc_dCT('m72', df)
 [1] Sample   Reporter CT       dCT     
 <0 rows> (or 0-length row.names)

 print (df)
Sample Reporter       CT dCT
1     m48      VIC 27.50595  NA
2     m48      VIC 27.77835  NA
3     m48      VIC 27.62321  NA
4     m48      FAM 30.87295  NA
5     m48      FAM 30.87967  NA
6     m48      FAM 30.73427  NA
7     c48      VIC 26.56715  NA
8     c48      VIC 26.89787  NA
9     c48      VIC 26.82587  NA
10    c48      FAM 30.20642  NA
11    c48      FAM 30.43074  NA
12    c48      FAM 30.36933  NA
13    m72      VIC 29.61585  NA
14    m72      VIC 28.65742  NA
15    m72      VIC 29.40057  NA
16    m72      FAM 32.27304  NA
17    m72      FAM 32.38696  NA
18    m72      FAM 32.24386  NA
19    c72      VIC 28.22370  NA
20    c72      VIC 28.17342  NA
21    c72      VIC 28.49104  NA
22    c72      FAM 31.91751  NA
23    c72      FAM 31.67524  NA
24    c72      FAM 31.87287  NA

It doesn't seem to be subsetting the data correctly and I'm not sure why this would be. I'm trying to populate the column 'dCT' with the calculated value for DCT.

Could you explain in words what are you trying to achieve? What's DCT? Why are you running `DF$Sample=='sample'` where no values in `DF$Sample` equal to `sample`? What is your desired output exactly? — David Arenburg, Mar 31 '15 at 10:31
if you look at the df, for example in Sample 'm48': DCT = mean of FAM - mean of VIC. I want this mean added to each row for 'm48'. Then I want to repeat the process for 'c48' etc. DF$Sample== sample, where sample is a variable supplied to the function, thanks for spotting 'sample' where it should just be sample with no speach marks. But is still doesn't calculate the mean of VIC - mean of FAM and append to the df. — user3062260, Mar 31 '15 at 10:43
Please remember to always post copy-pastable data, e.g. using dput or something similar. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Eike P., Mar 31 '15 at 12:09

score 2 · Answer 1 · answered Mar 31 '15 at 11:06

Here's a possible solution using data.table (assuming you don't have dCT column)

library(data.table) 
setDT(df)[, dCT := mean(CT[Reporter=='FAM']) - mean(CT[Reporter=='VIC']), by = Sample][]
# Sample Reporter       CT      dCT
# 1:    m48      VIC 27.50595 3.193127
# 2:    m48      VIC 27.77835 3.193127
# 3:    m48      VIC 27.62321 3.193127
# 4:    m48      FAM 30.87295 3.193127
# 5:    m48      FAM 30.87967 3.193127
# 6:    m48      FAM 30.73427 3.193127
# 7:    c48      VIC 26.56715 3.571867
# 8:    c48      VIC 26.89787 3.571867
...

Eike P. · Answer 2 · 2015-03-31T12:08:55.447

The same thing can obviously be done in dplyr, so I just thought I'd add another version.

df <- data.frame(Sample = c(rep("m48", 6), rep("c48", 6)), Reporter = c(rep("VIC", 3), rep("FAM", 3), rep("VIC", 3), rep("FAM", 3)), CT = c(27.50595, 27.77835, 27.62321, 30.87295, 30.87967, 30.73427, 26.56715, 26.89787, 26.82587, 30.20642, 30.43074, 30.36933))

library(dplyr)
df %>% group_by(Sample) %>% 
    mutate(dCT = mean(CT[Reporter == 'FAM']) - mean(CT[Reporter == 'VIC']))
# Source: local data frame [12 x 4]
# Groups: Sample
#
#    Sample Reporter       CT      dCT
# 1     m48      VIC 27.50595 3.193127
# 2     m48      VIC 27.77835 3.193127
# 3     m48      VIC 27.62321 3.193127
# 4     m48      FAM 30.87295 3.193127
# 5     m48      FAM 30.87967 3.193127
# 6     m48      FAM 30.73427 3.193127
# 7     c48      VIC 26.56715 3.571867
# 8     c48      VIC 26.89787 3.571867
# 9     c48      VIC 26.82587 3.571867
# 10    c48      FAM 30.20642 3.571867
# 11    c48      FAM 30.43074 3.571867
# 12    c48      FAM 30.36933 3.571867

Just because I know it isn't satisfying to receive responses stating "what you do isn't good, rather do this" - here are some notes on what didn't work with your original code. Note however, that i still recommend one of the other solutions.

R passes functions arguments by value, not by reference. This means that you can not change the dataframe df from inside your function, since you're only working on a copy. You would rather return a result and then modify df using this result.
length(dataframe) doesn't do what you think it does: It returns the number of columns, not the number of rows. What you whant is nrow(dataframe).
Assigning a single consant value to each element of a column in a dataframe doesn't require looping; just assign the value and R will expand automatically.

So here's a version of your code that works:

calc_dCT <- function(sample, DF){

    sample_df <- DF[ which(DF$Sample==sample),]
    VIC <- sample_df[ which(sample_df$Reporter=='VIC'),]
    FAM <- sample_df[ which(sample_df$Reporter=='FAM'),]

    VIC_mean<-mean(VIC[,3])
    FAM_mean<-mean(FAM[,3])

    DCT <- FAM_mean - VIC_mean

    sample_df$dCT <- DCT

    sample_df
}

dfnew <- data.frame(Sample=character(), Reporter=character(), CT=numeric(), dCT=numeric())
for (sample_name in unique(df$Sample))
    dfnew <- rbind(dfnew, calc_dCT(sample_name, df))

populate a new column in a df with new values

2 Answers2