So ive been stuck for days on how to find an average for each category in one of my columns in studio. I'm fairly new to r and have used this community to help me figure out how to run things before so I figured id give a shot in asking.
So up to this point I've been able to extract 3 columns from a big data set and have been told to find the average for each category in one column.
Species Subphylum amylase88_count
<chr> <chr> <dbl>
1 Abortiporus_biennis Agaricomycotina 2.000
2 Acanthostigma_perpusillum Pezizomycotina NA
3 Acaulospora_alpina Glomeromycota NA
4 Acaulospora_brasiliensis Glomeromycota 1.000
5 Acaulospora_cavernata Glomeromycota NA
6 Acaulospora_colliculosa Glomeromycota NA
7 Acaulospora_colombiana Glomeromycota NA
8 Acaulospora_delicata Glomeromycota NA
9 Acaulospora_dilatata Glomeromycota NA
10 Acaulospora_entreriana Glomeromycota NA
# … with 2,724 more rows"## Heading ## "
Not all of the values are NA, but in the subphylum region, I have many more subphylums that arent shown. I tried using ddply and this was my result
ddply(SubphyAM76count, .(Subphylum), summarize,
am76c_avg=mean(alphaMannanase76_count))
Subphylum am76c_avg
1 Agaricomycotina NA
2 Chytridiomycota NA
3 Cryptomycota 0
4 Entomophthoromycotina NA
5 Glomeromycota NA
6 Glomeromycotina NA
7 Kickxellomycotina NA
8 Mortierellomycotina NA
9 Mucoromycotina NA
10 Pezizomycotina NA
11 Pucciniomycotina NA
12 Saccharomycotina NA
13 Taphrinomycotina NA
14 Ustilaginomycotina NA
15 <NA> NA
Now, I know these values aren't reflecting what I'd like because there are lots of values for each of these subphylums. I'd post the entire excel sheet but its pretty massive. My guess is I have to tell r to ignore the NA's? but in the past it has ignores the NA's anyways. Any help would be appreciated. Thank you!