How to count unique entries, not sum values (environmental dataset)

Question

I'm conducting an analysis on native/non-native cover in a restoration site. The data is organized by Polygon, then transect, then pin. I don't care if a certain pin has 1 native or 3 native species - I just care if it has any. Right now the raw data looks like this:

raw data

In the end, I want my data to look like this format:

desired data

The problem is that right now my code is counting every single entry for native, non native etc. And summing them for each transect. However, I want it to sum the number of pins total that have a native/non-native/etc. Regardless of how many there are. So for example, if pin 5 has 3 natives, I still would just want that to count as 1 native in the final table. Can anyone help? Code below, can't share the data though:

mynewtable <- data %>% 
  count(polygon_id, transect, native_non_native) %>% 
  spread(native_non_native, n)

Please do not share data as a picture. You can provide the data in reproducible form by pasting the output of `dput(some_data)` in the question. — IceCreamToucan, Jan 06 '20 at 20:12
Welcome to SO! Please take a moment to read about how to create R examples: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — YOLO, Jan 06 '20 at 20:27
@BenBolker Oh my god yes, that totally worked. I've been trying to figure this out for hours. THANK YOU!! This is my first time using stack overflow so I'm not sure how to "vote" that you gave me a good answer, but thank you! — Rachel Kenny, Jan 06 '20 at 22:33
You can accept and upvote @BIcube's answer (which is equivalent to mine) — Ben Bolker, Jan 06 '20 at 22:38
Welcome to Stack Overflow! You should provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — M--, Jan 06 '20 at 23:13
@RachelKenny please click on the check mark next to the answer below if it answered your question. Thanks! — BICube, Jan 06 '20 at 23:24
@BenBolker On second review, the numbers I'm getting don't add up properly. For example, there are no trees in the OW-U2 polygon, but it shows 1 tree. So I'm not sure why, the numbers are close but a few of them are off by 1. I already tried making sure that polygon ID and transect were "as.factor" and not numeric. Any thoughts? — Rachel Kenny, Feb 27 '20 at 21:09

score 2 · Accepted Answer · answered Jan 06 '20 at 22:37

It seems like you are having some duplicates over what you are trying to count on. You can simply get a unique set of the data you are trying to count on and you should be able to get the desired results.

> df <- data.frame(polygon_id = replicate(10,'OW-M7'), 
                 transect = replicate(10,1),
                 pin_number = c(1,1,1,2,3,4,5,6,7,8), 
                 native_non_native =c(replicate(5,'Native'),replicate(5,'NoNative'))
                 )

> df
   polygon_id transect pin_number native_non_native
1       OW-M7        1          1            Native
2       OW-M7        1          1            Native
3       OW-M7        1          1            Native
4       OW-M7        1          2            Native
5       OW-M7        1          3            Native
6       OW-M7        1          4          NoNative
7       OW-M7        1          5          NoNative
8       OW-M7        1          6          NoNative
9       OW-M7        1          7          NoNative

> mynewtable <- df %>% select(polygon_id, transect, pin_number, native_non_native) %>% distinct() %>% count(polygon_id, transect, native_non_native) %>% spread(native_non_native, n)

> mynewtable
# A tibble: 1 x 4
  polygon_id transect Native NoNative
  <fct>         <dbl>  <int>    <int>
1 OW-M7             1      3        5

And of course if these are the only columns that you have in your dataframe, then you can simply ignore the select step and simply use distinct

> mynewtable <- distinct(df) %>% count(polygon_id, transect, native_non_native) %>% spread(native_non_native, n)

Hi @BICube - when I use the above method my numbers are off. For example, there are no trees in the OW-U2 polygon, but it shows 1 tree. So I'm not sure why, the numbers are close but a few of them are off by 1. I already tried making sure that polygon ID and transect were "as.factor" and not numeric. — Rachel Kenny, Feb 27 '20 at 21:08
@RachelKenny, that's not possible. Something else must be going on with your data. Can you try and reproduce the error you are seeing from the example that I listed above? I mean, can you try to play with df and change it to reproduce the error?. I might have a better chance of helping you if you provide a dataset that might break my proposed solution. — BICube, Feb 28 '20 at 05:49
you are right, there was an error in my filtering of the data that was leading to me getting bad results. I also keep getting an error with rowSums where it counts character columns as numeric but I was able to resolve that as well by telling it not to count the first column. Thank you! — Rachel Kenny, Mar 05 '20 at 19:17
@RachelKenny you are welcome. It would be great if you can click on the check mark again to accept this answer. — BICube, Mar 05 '20 at 21:31

How to count unique entries, not sum values (environmental dataset)

1 Answers1