How do I make summarize(count_by_) in dplyr only count instances if there is an exact match?

Question

I am trying to make a pivot table of a large .tsv data set in R and exporting it back to Excel.

I tried using the dplyer functions:

summary <- df %>%
group_by(Run,Prot) %>%
summarize(count_by_Id = n()) %>%
as.data.frame()

This almost works, but rows with e.g. "P61981;P62258" and "P62258" in the Prot column are counted together. How do I make R only summarize rows that have exactly the same strings in the Prot columns. So that in case of the above example there will be two different rows (for "P61981;P62258" and "P62258") in the summary data I am creating.

Based on your code, it won't be counted as same if they are different strings. Make sure to use `dplyr::summarise` assuming the summarise is not getting masked by plyr::summarise — akrun, Mar 10 '23 at 03:12
Does this answer your question? [Why does summarize or mutate not work with group\_by when I load \`plyr\` after \`dplyr\`?](https://stackoverflow.com/questions/26106146/why-does-summarize-or-mutate-not-work-with-group-by-when-i-load-plyr-after-dp) — Meisam, Mar 10 '23 at 09:33
Thank you all for the feedback; I am new to R, and can definitely use some help. @jrcalabrese: see below. There are more rows with 0's above what I copied, and more lines below with other variables. I can only copy a few because of the character limit 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -305177L), spec = structure(list(cols = list(File.Name = structure(list(), class = c("collector_character", "collector")), Run = structure(list(), class = c("collector_character", "collector")), Prot = structure(list(), class = c("collector_character", — RvS, Mar 10 '23 at 16:37
@MeisamYSF: I don't load plyr. This I guess is also why akrun's suggestion did not make a difference. — RvS, Mar 10 '23 at 16:40
Instead of commenting, you can provide the output of `dput(df)` in your original question as an edit. If the output is still notably large, you should [create a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) using either built-in data (e.g., `iris`) or a subset of your data (`dput(head(df))`) so we can reproduce your issue. — jrcalabrese, Mar 10 '23 at 17:18
@jrcalabrese When trying a small subset of my data I realized I made a mistake in comparing the R summary and the Excel Pivot table output. Sorry about that... I do have an additional question: is there a way to summarise in R based on unique values, comparable to doing a "distinct count" in a Excel Pivot table? Right now summarise counts records, not the number of records with a unique value, like I need. Thanks. — RvS, Mar 11 '23 at 03:38
If you have a new question that is separate from your original question, you should post a new question rather than add onto this question. — jrcalabrese, Mar 11 '23 at 16:23
@jrcalabrese : OK, I understand . Will do if I can't find the solution on-line. — RvS, Mar 11 '23 at 16:32

How do I make summarize(count_by_) in dplyr only count instances if there is an exact match?

0 Answers0