0

I am trying to make a pivot table of a large .tsv data set in R and exporting it back to Excel.

I tried using the dplyer functions:

summary <- df %>%
group_by(Run,Prot) %>%
summarize(count_by_Id = n()) %>%
as.data.frame()

This almost works, but rows with e.g. "P61981;P62258" and "P62258" in the Prot column are counted together. How do I make R only summarize rows that have exactly the same strings in the Prot columns. So that in case of the above example there will be two different rows (for "P61981;P62258" and "P62258") in the summary data I am creating.

jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
RvS
  • 1
  • Can you provide `dput(df)`? – jrcalabrese Mar 10 '23 at 03:08
  • 5
    Based on your code, it won't be counted as same if they are different strings. Make sure to use `dplyr::summarise` assuming the summarise is not getting masked by plyr::summarise – akrun Mar 10 '23 at 03:12
  • Does this answer your question? [Why does summarize or mutate not work with group\_by when I load \`plyr\` after \`dplyr\`?](https://stackoverflow.com/questions/26106146/why-does-summarize-or-mutate-not-work-with-group-by-when-i-load-plyr-after-dp) – Meisam Mar 10 '23 at 09:33
  • Thank you all for the feedback; I am new to R, and can definitely use some help. @jrcalabrese: see below. There are more rows with 0's above what I copied, and more lines below with other variables. I can only copy a few because of the character limit 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -305177L), spec = structure(list(cols = list(File.Name = structure(list(), class = c("collector_character", "collector")), Run = structure(list(), class = c("collector_character", "collector")), Prot = structure(list(), class = c("collector_character", – RvS Mar 10 '23 at 16:37
  • @akrun : I tried this, but it did not make any difference. – RvS Mar 10 '23 at 16:40
  • @MeisamYSF: I don't load plyr. This I guess is also why akrun's suggestion did not make a difference. – RvS Mar 10 '23 at 16:40
  • Instead of commenting, you can provide the output of `dput(df)` in your original question as an edit. If the output is still notably large, you should [create a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) using either built-in data (e.g., `iris`) or a subset of your data (`dput(head(df))`) so we can reproduce your issue. – jrcalabrese Mar 10 '23 at 17:18
  • @jrcalabrese When trying a small subset of my data I realized I made a mistake in comparing the R summary and the Excel Pivot table output. Sorry about that... I do have an additional question: is there a way to summarise in R based on unique values, comparable to doing a "distinct count" in a Excel Pivot table? Right now summarise counts records, not the number of records with a unique value, like I need. Thanks. – RvS Mar 11 '23 at 03:38
  • If you have a new question that is separate from your original question, you should post a new question rather than add onto this question. – jrcalabrese Mar 11 '23 at 16:23
  • @jrcalabrese : OK, I understand . Will do if I can't find the solution on-line. – RvS Mar 11 '23 at 16:32

0 Answers0