-1

I have a problem in R, which I can't seem to solve.

I have the following dataframe:

Image 1

I would like to:

  1. Find the unique combinations of the columns 'Species' and 'Effects'
  2. Report the concentration belonging to this unique combination
  3. If this unique combination is present more than one time, calculate the mean concentration

And would like to get the following dataframe:

Image 2

I have tried next script to get the unique combinations:

UniqueCombinations <- Data[!duplicated(Data[,1:2]),]

but don't know how to proceed from there.

Thanks in advance for your answers!

Tina

3 Answers3

6

Create some example data:

dat <- data.frame(Species = rep.int(LETTERS[1:4], c(4, 1, 3, 2)),
                  Effect = c(rep("Reproduction", 3), "Growth", "Growth",
                             "Reproduction", "Mortality", "Mortality",
                             "Growth", "Growth"),
                  Concentration = rnorm(10))

You can use the function aggregate:

aggregate(Concentration ~ Species + Effect, dat, mean)
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
5

Try the following (Thanks Brandon Bertelsen for nice comment):

Creating your data:

foo = data.frame(Species=c(rep("A",4),"B",rep("C",3),"D","D"), 
                 Effect=c(rep("Reproduction",3), rep("Growth",2),
                          "Reproduction", rep("Mortality",2), rep("Growth",2)), 
                 Concentration=c(1.2,1.4,1.3,1.5,1.6,1.2,1.1,1,1.3,1.4))

Using great package plyr for a bit of magic :)

library(plyr)
ddply(foo, .(Species,Effect), function(x) mean(x[,"Concentration"]))

And this is a bit more complicated, but cleaner version (Thanks again to Brandon Bertelsen):

ddply(foo, .(Species,Effect), summarize, mean=mean(Concentration))
Ali
  • 9,440
  • 12
  • 62
  • 92
  • Cleaner: `ddply(foo,.(Species,Effect),...` – Brandon Bertelsen Oct 22 '12 at 18:47
  • 3
    Cleaner-er: `...,.(Species, Effect), summarize, mean=mean(Concentration))` – Brandon Bertelsen Oct 22 '12 at 18:56
  • Thank you for your answer! I would like to ask a question regarding this answer, but I can't fit it in this box, as i also want to ad R scripts and outputs. What is the best way to do so? Should i start with a new question page? Thanks, Tina – Tina Van Regenmortel Oct 26 '12 at 09:12
  • @TinaVanRegenmortel You are welcome to post a new question. Write it in a way that it would be useful for the others and before posting your question, search a few to be sure your question is not already answered on StackOverflow – Ali Oct 26 '12 at 17:01
5

Just for fun before I call it a night.... Assuming your data.frame is called "dat", here are two more options:

  1. A data.table solution.

    library(data.table)
    datDT <- data.table(dat, key="Species,Effect")
    datDT[, list(Concentration = mean(Concentration)), by = key(datDT)]
    #    Species       Effect Concentration
    # 1:       A       Growth          1.50
    # 2:       A Reproduction          1.30
    # 3:       B       Growth          1.60
    # 4:       C    Mortality          1.05
    # 5:       C Reproduction          1.20
    # 6:       D       Growth          1.35
    
  2. An sqldf solution.

    library(sqldf)
    sqldf("select Species, Effect,
          avg(Concentration) `Concentration`
          from dat
          group by Species, Effect")
    #   Species       Effect Concentration
    # 1       A       Growth          1.50
    # 2       A Reproduction          1.30
    # 3       B       Growth          1.60
    # 4       C    Mortality          1.05
    # 5       C Reproduction          1.20
    # 6       D       Growth          1.35
    
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485