0

df

sampleName     realConc      exptname concentrate timepoints replicate    day  var
name1_0     3.877049e-05           0hr        55mM          0        b1 011311   1
name1_20    3.293085e-04           kcl        55mM         20        b1 011311   2
name1_40    3.999433e-05           kcl        55mM         40        b1 011311   3
name2_0     2.939995e-03           0hr        55mM          0        b1 011411   1
name2_20    1.212584e-02           kcl        55mM         20        b1 011411   2
name2_40    1.894434e-02           kcl        55mM         40        b1 011411   3

I want to divide every realConc value by the realConc value with a timepoint of 0 that has the has an equal day,replicate, and concentrate value

I was trying a for loop, and not too much luck, can you help me out?

for (i in 1:dim(df)[1]){

df$realConc <- df$realConc[i] / df[which(duplicated(paste(replicate,day))) & df$timepoint == 0,]$realConc[i]
}

I was thinking something like this, but it obviously doesn't work

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Doug
  • 597
  • 2
  • 7
  • 22
  • Welcome to StackOverflow. Perhaps if you made a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that demonstrates your question / problem, people would find it easier to answer. – Andrie Aug 18 '12 at 07:00

3 Answers3

3

plyr is your friend!

library(plyr)
ddply(df, .(day, replicate, concentrate),
      transform, scaled=realConc/realConc[timepoints==0])

#   sampleName     realConc exptname concentrate timepoints replicate   day var   scaled
# 1    name1_0 3.877049e-05      0hr        55mM          0        b1 11311   1 1.000000
# 2   name1_20 3.293085e-04      kcl        55mM         20        b1 11311   2 8.493793
# 3   name1_40 3.999433e-05      kcl        55mM         40        b1 11311   3 1.031566
# 4    name2_0 2.939995e-03      0hr        55mM          0        b1 11411   1 1.000000
# 5   name2_20 1.212584e-02      kcl        55mM         20        b1 11411   2 4.124442
# 6   name2_40 1.894434e-02      kcl        55mM         40        b1 11411   3 6.443664
seancarmody
  • 6,182
  • 2
  • 34
  • 31
  • 2
    I once got [upvoted for using *is your friend*](http://stackoverflow.com/questions/11735192/r-table-formating/11735326#comment15575785_11735326) in the opening line of one of my answers. Now, I'll try to balance the scales: +1 – A5C1D2H2I1M1N2O1R2T1 Aug 18 '12 at 07:29
  • I have a feeling that [you will get all the love for this answer](http://chat.stackoverflow.com/transcript/106?m=4978985#4978985) though. – A5C1D2H2I1M1N2O1R2T1 Aug 18 '12 at 11:24
  • plyr is nice, although I keep getting a arguments imply differing number of rows: error message – Doug Aug 18 '12 at 11:27
  • I don't know what it is, bu it looks like only the first value after the zero hour timepoint is being divided by the zero hour time point – Doug Aug 18 '12 at 13:49
  • @LucasPinto, the answers from Sean and I are the same, and do what you requested in your original question. Are you using the same aggregation that you originally asked for or is there something else that might account for your confusion here? – A5C1D2H2I1M1N2O1R2T1 Aug 18 '12 at 14:10
  • @LucasPinto I assume your data set is bigger. Have you got any cases with more than one zero hour data point? – seancarmody Aug 18 '12 at 14:16
  • What does `ddply(df, .(day, replicate, concentrate), summarise, count=sum(timepoints==0))` look like? If it does not have 1 for every count entry, there will be a problem! – seancarmody Aug 18 '12 at 14:21
  • yes, very true I will set up a filter to only include those who have a 0 hr. Is there any way to have timepoints==a numeric vector as opposed to the single 0, I can't get that to work – Doug Aug 19 '12 at 00:41
  • count=sum(timepoints==0&20&40) I wonder if this will do the trick – Doug Aug 19 '12 at 00:49
1

You haven't specified what you want your output to look like, but here's one way to perform that calculation:

First, read in your data (It's better to use dput() or provide some code to recreate your data).

test = read.table(header=TRUE, text = "sampleName     realConc      exptname concentrate timepoints replicate    day  var
name1_0     3.877049e-05           0hr        55mM          0        b1 011311   1
name1_20    3.293085e-04           kcl        55mM         20        b1 011311   2
name1_40    3.999433e-05           kcl        55mM         40        b1 011311   3
name2_0     2.939995e-03           0hr        55mM          0        b1 011411   1
name2_20    1.212584e-02           kcl        55mM         20        b1 011411   2
name2_40    1.894434e-02           kcl        55mM         40        b1 011411   3")

Then, split your data according to the groupings you require.

temp = split(test, list(test$day, test$concentrate, test$replicate))

Third, figure out the realConc value for timepoints == 0 by group, and use that to do your division.

lapply(temp, function(x) x[, 2]/x[which(x$timepoints == 0), 2])
# $`11311.55mM.b1`
# [1] 1.000000 8.493793 1.031566
# 
# $`11411.55mM.b1`
# [1] 1.000000 4.124442 6.443664

Update: data.frame output

temp = split(test, list(test$day, test$concentrate, test$replicate))
temp = lapply(temp, function(x) { x$divided = x[, 2]/
  x[which(x$timepoints == 0), 2]; x })
temp = do.call(rbind, temp)
rownames(temp) = NULL
temp
#   sampleName     realConc exptname concentrate timepoints replicate   day var  divided
# 1    name1_0 3.877049e-05      0hr        55mM          0        b1 11311   1 1.000000
# 2   name1_20 3.293085e-04      kcl        55mM         20        b1 11311   2 8.493793
# 3   name1_40 3.999433e-05      kcl        55mM         40        b1 11311   3 1.031566
# 4    name2_0 2.939995e-03      0hr        55mM          0        b1 11411   1 1.000000
# 5   name2_20 1.212584e-02      kcl        55mM         20        b1 11411   2 4.124442
# 6   name2_40 1.894434e-02      kcl        55mM         40        b1 11411   3 6.443664
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • yes, i want those values, except in the df. I want the same df except with these new values in realConc, I'm not sure if you could go about that the same way w/ the split function – Doug Aug 18 '12 at 07:17
  • @LucasPinto, and if you wanted to *substitute* the current `realConc` with these new values, just change the second line in the update to: `temp = lapply(temp, function(x) { x$realConc = x[, 2]/x[which(x$timepoints == 0), 2]; x })` instead of creating a new column. – A5C1D2H2I1M1N2O1R2T1 Aug 18 '12 at 08:42
0

This is a simple base R version. It might be easier to debug since each step is visible.

# find all possible denominators and rename realConc to avoid duplicate name in merge
denom <- x[x$timepoints == 0,c('realConc','concentrate','replicate','day')]
names(denom)[1] <- 'realConcDenominator'

# merge in new column with appropriate denominator
x$realConcDenominator <- merge(x,denom,by = c('concentrate','replicate','day'),all.x = T)[,'realConcDenominator']

# and divide
x$result <- x$realConc / x$realConcDenominator

And another using apply.

# or use apply in one shot
x$applyresult <- apply(x,1,function(x,denom){
  as.numeric(x['realConc'])/denom[denom$concentrate == x['concentrate'] & denom$replicate == x['replicate'] & denom$day == x['day'],'realConc']
},denom = x[x$timepoints == 0,c('realConc','concentrate','replicate','day')])
ARobertson
  • 2,857
  • 18
  • 24