0

I'm struggling with the dplyr-syntax and although I Googled a lot, I´m stuck. I have a data frame with 8,594 rows and two different variables (as factors). Now I want to find out how many times a species appears in my data frame using dplyr in R.

My data frame looks like this:

    dfrm <- data.frame (cod_lance= c( "1994_100","1994_100",
        "1994_100","1994_100","1994_101","1994_101","1994_101",
        "1994_120","1994_120","1994_120","1994_120","1996_10",
        "1996_10","1996_10","1996_10","1997_65","1997_65",
        "1997_65","1997_65","1997_65","1997_65","1997_66",
        "1997_66", "1997_66","1997_66"), 
         especie= c("Micromesistius poutassou","Gadiculus argenteus",
         "Merluccius merluccius","Gaidropsaurus macrophthalmus",
        "Merluccius merluccius","Micromesistius poutassou","Gadiculus argenteus",
        "Trisopterus luscus","Merluccius merluccius","Trisopterus minutus",
        "Phycis blennoides","Gadiculus argenteus","Gaidropsaurus macrophthalmus",
        "Merluccius merluccius","Micromesistius poutassou",
        "Trisopterus minutus","Phycis blennoides","Gadiculus argenteus",
       "Gaidropsaurus macrophthalmus",
       "Merluccius merluccius", "Micromesistius poutassou",
       "Nezumia aequalis","Phycis blennoides",
       "Gadiculus argenteus","Trisopterus luscus"))

What I want to get is a data frame like this (using the example above)

freq <- data.frame (especie=c("Gadiculus argenteus","Gaidropsaurus
       macrophthalmus","Merluccius merluccius","Micromesistius poutassou",
       "Nezumia aequalis","Phycis blennoides","Trisopterus luscus",
       "Trisopterus minutus"), N=c(4,3,5,4,1,3,2,2))

I have tried several approaches like, for example,

df1 <- (dfrm %>% count(cientifico) %>% group_by (cod_lance))

but I always get the same type of error. In this example: "(Error in grouped_df_impl(data, unname(vars), drop) : Column cod_lance is unknown)" and I don´t know neither what I was doing wrong nor what´s the solution.

Any help will be very welcome. Thanks in advance.

Juan Carlos
  • 173
  • 13

2 Answers2

1

As Juan Carlo shows, group_by and summarize is the classic way to do this (and also what I usually use.) That said, if this is an operation that you very frequently, you may find it handy to use the count() or tally() shortcuts in dplyr.

In this case, you would write:

count(df, especie)

For more information on count(), see: https://dplyr.tidyverse.org/reference/tally.html

It doesn't matter here since you only have one grouping variables, but this approach is also nice because it automatically calls ungroup() after summarize(). When group_by() contains multiple grouping variables, summarize() natively leave the data partially grouped (by all but the final variable in your group_by.) This can sometimes have unexpected downstream consequences (because the next time you try to apply an aggregate function, it will still assume that grouping.)

Emily Riederer
  • 106
  • 1
  • 2
0

Based on your freq data.frame dfrm %>% count(especie) returns what you want, same as the answer of @tmfmnk.

If you look at the error you get, the result of dfrm %>% count(especie) is a tibble of 2 columns which doesn't contain cod_lance anymore. Hence your group_by statement gives you the error

Error in grouped_df_impl(data, unname(vars), drop) : Column cod_lance is unknown

You first need to do a group by before creating summaries or frequencies within a group of a variable. For example the following code will give you the number of espiece per value of cod_lance.

dfrm %>% 
  group_by (cod_lance) %>% 
  summarise(n = n()) # for frequencies tally() would also work.

# A tibble: 6 x 2
  cod_lance     n
  <fct>     <int>
1 1994_100      4
2 1994_101      3
3 1994_120      4
4 1996_10       4
5 1997_65       6
6 1997_66       4

Btw, more information on dplyr workflow can be found in R for Data Science chapter 5.

phiver
  • 23,048
  • 14
  • 44
  • 56