0

I am a little bit rusty in R, and appreciate if you could help me.

I have a dataframe that I need to get some stats from it. The data frame (in a simpler way) looks like this:

df <- data.frame(tech=c("Leonardo", "Leonardo", "Leonardo", "John", "John", "John", "Will", "Will", "Will", "Bob"),
         type=c("V", "P", "V", "V", "P", "V", "V", "P", "V", "V"),
         breed=c("A", "A", "A", "B", "B", "B", "C", "C", "A", "B"),
         central=c("J", "J", "K", "J", "K", "J", "K", "K", "K", "J")

I need to get the percentage of "type" by each technician in another dataframe. And then other daframe containg the percentage of "type" by each technician by breed. (ex: If a technician has only one type (either V or P), it would reflect in 100%)

I have seen other topics where the user wanted to get similar information (Summarizing by subgroup percentage in R) but they had a numerical value to get this percentage. In my case, I have V or P. I imagine it's the same way of thinking but I tried the suggested solution on the other post and it's not working in my case.

Is there a simple way of doing this? I appreciate your help

pete
  • 129
  • 5
  • 2
    Please clarify what is your expected output for this input? Something like `df |> count(tech, type) |> summarize(pct = n / n() * 100, .by = tech)`? – Gregor Thomas May 22 '23 at 17:02
  • (Or maybe `df |> count(tech, type, breed) |> summarize(pct = n / n() * 100, .by = c(tech, breed))`?) – Gregor Thomas May 22 '23 at 17:35

1 Answers1

1

If I'm understanding your question right, this is how I'd do it using dplyr:

df |> 
dplyr::group_by(tech, breed) |>
dplyr::summarize(pct_v = sum(type == "V")/dplyr::n() * 100, 
                 pct_p = sum(type == "P")/dplyr::n() * 100)
S. Wright
  • 26
  • 2
  • Thank you, it worked! Maybe I have a question to add to this: is it possible to add in another collumn the number of observation of V and P that were used to calculate the percentage? – pete May 24 '23 at 18:33