0

I am trying to calculate the mean, sd and se of the frequency of several species over the number sites per habitat. I have three sites per habitat and four habitat types, so twelve sites in total. My dataset looks like this

Site Species Habitat  Count

A      X   Wetland      3
B      T   Urban       12
B      U   Forest      18
C      Z   Grassland    3
C      Z   Grassland    6

My issue is, not all species are recorded in each site and I am getting NA values when I run the code and the mean is not calculating correctly because not all species were recorded in each site (N)

cdata <- ddply(df, c("Species", "Habitat"), summarise,
               N    = sum(Count),
               mean = mean(Count),
               sd   = sd(Count),
               se   = sd / sqrt(N))

I have tried using mutate rather than summarise to set N to 3, the number of sites per habitat but I am still getting NA value for SD and SE

  • 2
    Were the species not present in the sites, or were they simply not recorded? If the latter, then you're trying to treat missing data as zero, which is not appropriate. – alan ocallaghan Feb 10 '20 at 15:16
  • Species were not present in the site – niamhailbhe Feb 10 '20 at 15:17
  • So what is your desired result? Zeros for those missing species per site. Can you post enough data to show multiple entries per species per site? – Parfait Feb 10 '20 at 15:38
  • If they weren't present, just record them as zeros, or use something like [complete](https://tidyr.tidyverse.org/reference/complete.html) or `dplyr::replace_na`. – alan ocallaghan Feb 10 '20 at 15:46

1 Answers1

0

A work around could be to set the NA Values to 0s. This is explained here: Set NA to 0 in R

cdata.data[is.na(cdata.data)] <- 0

Possibly not the cleanest but seems like it may work.

Iain McL
  • 164
  • 1
  • 10
  • 1
    `NA` is not the same as zero which can affect aggregates like `mean` and `sd`. – Parfait Feb 10 '20 at 15:33
  • Agreed. I just read it as no record was equivelent to measuring the absense of somthing in which case zeros would make sense. – Iain McL Feb 10 '20 at 16:51