0

I'm struggling a bit with conditional subsetting of information to average the subset

I have 2 datasets:

type<-c("flesh","wholefish","wholefish","wholefishdelip")
group<-c("two","four",'five','five')
N<-c(10.2,11.1,10.7,11.3)
prey <- cbind(type,group,N)

sample<-c('plasma','wholeblood','redbloodcell')
group1<-c('four','four','two')
group2<-c('','five','four')
group3<-c('','','five')
avgN<-c("","","")
penguin<-cbind(sample,group1,group2,group3,avgN)

I want to output to look like this

sample       |  group1  |  group2  |  group3  |  avgNwf                
plasma       | four     |          |          |  11.1  #made up by (11.1/1)
wholeblood   | four     | five     |          |  10.9  #(11.1+10.7)/2
redbloodcell | two      | four     | five     |  10.9  #(11.1+10.7)/2

I want to calculate a value for penguin$avgN according to conditions per row. I want to calculate the average prey$N if prey$Type == "wholefish" & prey$group matches penguin$group1, penguin$group2 and penguin$group3. Not all penguin groups have entries so I was running into a problem with excel where I couldn't make it ignore the #N/A. (And excel doesn't have a function for conditional standard deviations)

IE for the first row in the penguin dataframe, I want to average N (of the prey df) for all wholefish in groups four and five. I have tried the following with fewer conditions just to see if I am on track but to no avail:

avgN <-mean(ifelse(prey$group==penguin$group1,prey$N, "nope"))

avgN <-mean(prey$N[prey$group==penguin$group1,])

The following is not what I want to achieve:

avgN = summaryBy(N ~group+type, data=prey, FUN=c(mean, sd), na.rm=T)

as it brings back a summary version of information instead of an individual result for each entry with its own conditions.

avgN <-mean(prey$N)

as it lacks the conditions for each individual sample.

In excel I would use cell references to work with conditions unique to a row.

mckisa
  • 155
  • 2
  • 2
  • 7
  • Screenshots of your data aren't very useful. If you can provide actual samples of your dataset + the desired result based on said sample, you'll be more likely to get good advice. See here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Z.Lin Aug 23 '17 at 00:58
  • Thank you for the link, I hope this is more useful – mckisa Aug 23 '17 at 12:49
  • Thanks for providing more info, but the description of how you want to calculate the result is still unclear. Also, your description makes reference to a dataframe called predator, but that's not in the example. – Z.Lin Aug 23 '17 at 13:10

1 Answers1

0

So here is an answer for anyone struggling with something similar

for(i in 1:3) {
  #condition 1 prey$type=="wholefish"
  a<- which(prey[,1]=="wholefish") 
  #condition 2 prey$group==penguin$group1
  b<- which(prey[,2]==penguin[i,2])
            c<-match(a,b)
            d<-which(c>0)
            ad<-a[d]
  #condition 3 prey$group==penguin$group2          
            bb<- which(prey[,2]==penguin[i,3])
            cc<-match(a,bb)
            dd<-which(cc>0)
            add<-a[dd]
  #condition 4 prey$group==penguin$group4                    
               bbb<- which(prey[,2]==penguin[i,4])
               ccc<-match(a,bbb)
               ddd<-which(ccc>0)
               addd<-a[ddd]
  #some objects returned interger(0) which meant the mean couldn't be calculated 
  #so I removed those                                 
              if (identical(add,integer(0))==TRUE) {relrows<-c(ad)
               } else {relrows<-c(ad,add)}
               if (identical(addd,integer(0))==TRUE) {relrows2<-c(relrows)
               } else {relrows2<-c(relrows,addd)}
  #turns out prey and penguin were matrices
  #to ensure that only the values of prey$N are used 
  #I made a new object with just a string a numbers            
               as.numeric(prey[,3])->prey3
  #then I could do the calculations I wanted
               penguin[i,5]<-mean(prey3[relrows2])
               penguin[i,6]<-sd(prey3[relrows2])
}

Thank you Z.Lin for your help

mckisa
  • 155
  • 2
  • 2
  • 7