Create new column that lists the names of columns containing values greater than a certain number

Question

So I am working on code related to native species restoration. By the end of this code chunk, I have a column called "count_species_greater_5" that counts the number of native species that have appeared at least 5 times per polygon_ID. What I want to do next, is list those species (which are column names) in a new column. See the table below for what the column would look like (made in excel, "# Species ≥ 5%" is the same as my column "count_species_greater_5", "Species ≥ 5%" is what I am trying to recreate in R). How do I get this last column that will list the names of the species with greater than 5 hits?

Note, my code is below and I know it's not the smoothest or cleanest code, but it does work so far to achieve my goals. Also, for some reason rowsums was treating "polygon_id" like a number and adding it into the total for the species > 5 column, so I used "as.factor" so it wouldn't be counted but I'm not sure if that was the right approach, though it has worked to produce the results I wanted. I am happy to take any advice on a better way to do that, or a better way to code any of this. Thanks!

I am relatively new to this and though I googled it I couldn't figure out how to attach my data, I did look at this link but I didn't understand it. I don't know if it's possible to answer this question without my data, so if anyone has advice on how to upload a csv let me know.


AC_OCWR_Data <- read_excel("AC_OCWR_Data.xlsx")
data <- clean_names(AC_OCWR_Data)

diversity1 <- ac_data %>%
  dplyr::filter(data_type == "T.PI") %>%
  select(transect, polygon_id, native_non_native, species_code) %>% 
  group_by(polygon_id, transect, native_non_native) %>% 
  count(species_code) 

colnames(diversity1)[5] <- c("total") 
diversity1$perc_cover <- diversity1$total/50*100


diversity1_native <- diversity1 %>% 
  dplyr::filter(native_non_native == "Native")

diversity2 = dcast(diversity1_native, polygon_id + transect ~ species_code, value.var = "perc_cover")
diversity2 <- aggregate(diversity2[3:45],list(diversity2$polygon_id), mean, na.rm=TRUE)
names(diversity2)[1]<-paste("polygon_id")
diversity2$polygon_id <- as.factor(diversity2$polygon_id) 
diversity2 <- diversity2 %>% 
  mutate(count_species_greater_5 = rowSums(diversity2 > 5, na.rm = TRUE)) 

View(diversity2)

diversity2 output looks like this:

per Mike's suggestion, here is the diversity 2 output:

structure(list(polygon_id = structure(1:13, .Label = c("5M BUFFER", 
"OW-M1", "OW-M2", "OW-M3", "OW-M4", "OW-M5", "OW-M7", "OW-U1", 
"OW-U11", "OW-U2", "OW-U3", "OW-U6", "Weedy Control"), class = "factor"), 
    ACMGLA = c(2, NaN, 2, NaN, NaN, NaN, NaN, 2, NaN, 2, 2, 2, 
    4), AMBPSI = c(NaN, NaN, 2, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN), AMSMEN = c(2, 2, NaN, NaN, NaN, NaN, 
    8, NaN, NaN, NaN, NaN, NaN, NaN), ARTCAL = c(26, 12, 6, 8, 
    6, NaN, 4, NaN, 4, 2, 12, 38, 4), ARTDOU = c(2, 6, 2, 2, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN), ARTDRA = c(11.3333333333333, 
    34, 18, 62, NaN, 12, NaN, 14, 12, 34, 26, NaN, 32), ATRCAN = c(NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 6), 
    BACSAL = c(4, NaN, NaN, 4, NaN, 12, 12, 34, 22, NaN, NaN, 
    NaN, 14.6666666666667), CALMAC = c(4, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN), CORRIG = c(NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, 6, NaN, NaN, NaN, NaN), 
    CROCAL = c(NaN, NaN, NaN, 2, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN), CRYINT = c(NaN, 6, NaN, NaN, NaN, NaN, 4, 
    NaN, NaN, NaN, NaN, NaN, NaN), CUCFOE = c(NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 2), DEIFAS = c(6, 
    2, NaN, NaN, NaN, NaN, 32, NaN, NaN, NaN, NaN, NaN, NaN), 
    DIPAUR = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, 2, NaN), ELYCON = c(NaN, NaN, NaN, NaN, 2, NaN, 16, 
    NaN, NaN, NaN, 36, NaN, NaN), ENCCAL = c(NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, 8, 12, NaN, 4), ERIFAS = c(44, 16, 
    50, 26, 58, 62, 32, 22, 20, 74, 14, 42, NaN), ESCCAL = c(NaN, 
    6, 2, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN), 
    EUCCHR = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    2, 2, NaN), EUCCRY = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    2, NaN, 2, NaN, NaN, NaN), GALANG = c(2, NaN, 4, NaN, NaN, 
    2, NaN, NaN, NaN, 2, NaN, 2, NaN), GALAPA = c(NaN, NaN, NaN, 
    2, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN), HELANN = c(NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 2, NaN, NaN), 
    ISOMEN = c(8, NaN, NaN, NaN, NaN, NaN, 4, NaN, NaN, NaN, 
    4, 10, NaN), MALFAS = c(16, 4, NaN, NaN, 10, 6, 8, NaN, 2, 
    NaN, NaN, 2, NaN), MALLAU = c(12, NaN, NaN, NaN, 2, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN), MALSAX = c(4, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN), MARMAC = c(NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 3), 
    OPULIT = c(NaN, NaN, NaN, NaN, NaN, NaN, 8, NaN, NaN, NaN, 
    4, NaN, NaN), PHACIC = c(14, 26, 42, NaN, NaN, 6, NaN, 8, 
    NaN, 4, 6, NaN, NaN), PHAPAR = c(NaN, NaN, 2, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, 4, NaN, NaN), QUEARG = c(NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 32, NaN), RAFCAL = c(2, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN
    ), RHUINT = c(6, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN), SALLAE = c(NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, 8), SALMEL = c(NaN, NaN, 
    NaN, 10, 4, 2, 8, 2, 12, NaN, 10, 12, 10), SAMNIG = c(NaN, 
    8, 40, NaN, 8, 6, NaN, 10, 6, NaN, 2, NaN, NaN), SOLDOU = c(NaN, 
    NaN, 4, NaN, NaN, NaN, NaN, 4, NaN, NaN, NaN, NaN, NaN), 
    SOLSPP = c(2, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN), STIPUL = c(4, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN), TOXDIV = c(NaN, NaN, NaN, 
    NaN, NaN, NaN, 4, NaN, NaN, NaN, NaN, NaN, NaN), XANSTR = c(NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 2), 
    count_species_greater_5 = c(9, 8, 5, 4, 4, 6, 8, 5, 6, 3, 
    7, 5, 5)), class = "data.frame", row.names = c(NA, -13L))

Done! Shown as an image, because again I'm not sure how to upload data or tables. — Rachel Kenny, Feb 21 '20 at 20:43
if you want to share data one way would be to use the `dput()` function, you could do `dput(diversity2)` in your R session and then copy and paste the output to your question. if your data is too large you can always do something like this `dput(head(diversity2))` — Mike, Feb 21 '20 at 20:52
My mistake, I fixed it. Now it reflects the current diversity 2 output. So I'm trying to add in that last column, but in r. I can do it in excel but haven't figure out how to do it in r. — Rachel Kenny, Feb 21 '20 at 20:52
yes that is correct! you can also share your other data.frames like that as well if you think it is necessary to help with your question — Mike, Feb 21 '20 at 21:43
@mike Awesome, thank you so much! This will be so helpful for future posts, and I didn't really understand it before. — Rachel Kenny, Feb 21 '20 at 22:08

score 1 · Accepted Answer · answered Feb 21 '20 at 21:04

We can pivot to 'long' format and then filter the names based on the frequency of values

library(dplyr)
library(tidyr)
diversity2 %>%
    select(-count_species_greater_5) %>%
    pivot_longer(cols = - polygon_id, values_drop_na = TRUE) %>% 
    filter(value >= 5) %>%          
    group_by(polygon_id) %>% 
    summarise(species_name = toString(name), 
              count_species_greater_5 = n())
# A tibble: 13 x 3
#   polygon_id    species_name                                                           count_species_greater_5
#   <fct>         <chr>                                                                                    <int>
# 1 5M BUFFER     ARTCAL, ARTDRA, DEIFAS, ERIFAS, ISOMEN, MALFAS, MALLAU, PHACIC, RHUINT                       9
# 2 OW-M1         ARTCAL, ARTDOU, ARTDRA, CRYINT, ERIFAS, ESCCAL, PHACIC, SAMNIG                               8
# 3 OW-M2         ARTCAL, ARTDRA, ERIFAS, PHACIC, SAMNIG                                                       5
# 4 OW-M3         ARTCAL, ARTDRA, ERIFAS, SALMEL                                                               4
# 5 OW-M4         ARTCAL, ERIFAS, MALFAS, SAMNIG                                                               4
# 6 OW-M5         ARTDRA, BACSAL, ERIFAS, MALFAS, PHACIC, SAMNIG                                               6
# 7 OW-M7         AMSMEN, BACSAL, DEIFAS, ELYCON, ERIFAS, MALFAS, OPULIT, SALMEL                               8
# 8 OW-U1         ARTDRA, BACSAL, ERIFAS, PHACIC, SAMNIG                                                       5
# 9 OW-U11        ARTDRA, BACSAL, CORRIG, ERIFAS, SALMEL, SAMNIG                                               6
#10 OW-U2         ARTDRA, ENCCAL, ERIFAS                                                                       3
#11 OW-U3         ARTCAL, ARTDRA, ELYCON, ENCCAL, ERIFAS, PHACIC, SALMEL                                       7
#12 OW-U6         ARTCAL, ERIFAS, ISOMEN, QUEARG, SALMEL                                                       5
#13 Weedy Control ARTDRA, ATRCAN, BACSAL, SALLAE, SALMEL                                                       5

NOTE: If these columns needs to be added back to the initial dataset, either do bind_cols by selecting the columns of interest or a left_join with the 'polygon_id' as by column

Create new column that lists the names of columns containing values greater than a certain number

1 Answers1