0

So ive been stuck for days on how to find an average for each category in one of my columns in studio. I'm fairly new to r and have used this community to help me figure out how to run things before so I figured id give a shot in asking.

So up to this point I've been able to extract 3 columns from a big data set and have been told to find the average for each category in one column.

       Species                   Subphylum       amylase88_count
        <chr>                     <chr>                     <dbl>
      1 Abortiporus_biennis       Agaricomycotina              2.000            
      2 Acanthostigma_perpusillum Pezizomycotina               NA
      3 Acaulospora_alpina        Glomeromycota                NA
      4 Acaulospora_brasiliensis  Glomeromycota                1.000
      5 Acaulospora_cavernata     Glomeromycota                NA
      6 Acaulospora_colliculosa   Glomeromycota                NA
      7 Acaulospora_colombiana    Glomeromycota                NA
      8 Acaulospora_delicata      Glomeromycota                NA
      9 Acaulospora_dilatata      Glomeromycota                NA
     10 Acaulospora_entreriana    Glomeromycota                NA
     # … with 2,724 more rows"## Heading ## "

Not all of the values are NA, but in the subphylum region, I have many more subphylums that arent shown. I tried using ddply and this was my result

ddply(SubphyAM76count, .(Subphylum), summarize, 
am76c_avg=mean(alphaMannanase76_count))

                      Subphylum am76c_avg
        1        Agaricomycotina        NA
        2        Chytridiomycota        NA
        3           Cryptomycota         0
        4  Entomophthoromycotina        NA
        5          Glomeromycota        NA
        6        Glomeromycotina        NA
        7      Kickxellomycotina        NA
        8    Mortierellomycotina        NA
        9         Mucoromycotina        NA
        10        Pezizomycotina        NA
        11      Pucciniomycotina        NA
        12      Saccharomycotina        NA
       13      Taphrinomycotina        NA
       14    Ustilaginomycotina        NA
       15                  <NA>        NA

Now, I know these values aren't reflecting what I'd like because there are lots of values for each of these subphylums. I'd post the entire excel sheet but its pretty massive. My guess is I have to tell r to ignore the NA's? but in the past it has ignores the NA's anyways. Any help would be appreciated. Thank you!

Phil
  • 7,287
  • 3
  • 36
  • 66

1 Answers1

0

Using dplyr you can simply do:

df <- df %>% 
      group_by(Subphylum) %>% 
      summarise(val = mean(amylase88_count, na.rm=TRUE))
YOLO
  • 20,181
  • 5
  • 20
  • 40
  • is there a group_by function that could be used on characters? I've been looking around because I'm receiving an error saying it cannot be used on characters. Thank you so much, you've helped me get closer to my goal. – Andrea Hdz Aug 24 '20 at 18:12
  • @AndreaHdz yes, with `na.rm` argument, check edit please – YOLO Aug 24 '20 at 18:22