0

I cannot figure out what is wrong with this conditional mean using dplyr in R. I want to group by WELL_CATEG and output the TCP_MEAN in each level of the WELL_CATEG but this only returns one value:

#mean_by_group
df2 <- df1000 %>% group_by(WELL_CATEG) %>% summarise(TCP_MEAN = mean(TCPNoZeros, 
na.rm=T))

There should be more categories in the output since there are multiple levels in the factor WELL_CATEG ("Domestic", "Monitoring", "Municipal", "Water Supply, Other").

However, the output has only one value.

> df2
 TCP_MEAN
 1 0.193232

Here is a sample of my data (my large dataset also has NA values):

structure(list(WELL_ID = c("5700542-001", "5700551-001", "5700552-001", 
"5700554-001", "5700571-011", "5700571-012", "5700575-001", "T0604700079-NW-2", 
"T0604700079-MW-9", "T0604700079-NW-3", "T0604700079-NW-4", "T0604700079-NW-9", 
"T0604700079-NW-8", "T0604700079-NW-6", "S4-TUSK-TLA02", "DM-U-03", 
"RED-06", "RICE-20", "RICE-09", "MODFP-03", "5700652-010", "USGS-361422119431201", 
"USGS-363645119420702", "USGS-364112119352701", "USGS-364258119380201", 
"USGS-363418119384203", "USGS-382718121224901", "USGS-372205120433801"
), WELL_CATEG = c("Domestic", "Domestic", "Domestic", "Domestic", 
"Domestic", "Domestic", "Domestic", "Monitoring", "Monitoring", 
"Monitoring", "Monitoring", "Monitoring", "Monitoring", "Monitoring", 
"Municipal", "Municipal", "Municipal", "Municipal", "Municipal", 
"Municipal", "Water Supply, Other", "Water Supply, Other", "Water Supply, Other", 
"Water Supply, Other", "Water Supply, Other", "Water Supply, Other", 
"Water Supply, Other", "Water Supply, Other"), Shape_Leng = c(6283.185307, 
6283.185307, 6283.185307, 6283.185307, 6283.185307, 6283.185307, 
6283.185307, 6283.185307, 6283.185307, 6283.185307, 6283.185307, 
6283.185307, 6283.185307, 6283.185307, 6283.185307, 6283.185307, 
6283.185307, 6283.185307, 6283.185307, 6283.185307, 6283.185307, 
6283.185307, 6283.185307, 6283.185307, 6283.185307, 6283.185307, 
6283.185307, 6283.185307), TCP_MEAN = c(0, 0, 0, 0, 0, 0, 0, 
0.531914894, 0.535714286, 0.638297872, 0.714285714, 0.731707317, 
0.731707317, 0.760869565, 0.006, 0.0625, 0.12, 0.18, 0.18, 0.18, 
1, 2, 2, 2, 2, 4, 4, 6), TCPNoZeros = c(0.0025, 0.0025, 0.0025, 
0.0025, 0.0025, 0.0025, 0.0025, 0.531914894, 0.535714286, 0.638297872, 
0.714285714, 0.731707317, 0.731707317, 0.760869565, 0.006, 0.0625, 
0.12, 0.18, 0.18, 0.18, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 
0.0025, 0.0025, 0.0025)), class = "data.frame", row.names = c(NA, 
-28L))
stefan
  • 90,330
  • 6
  • 25
  • 51
BHope
  • 135
  • 9
  • 2
    Your code works on my computer! It returns a dataframe with means for each of the 4 categories. – Shubham Mar 07 '23 at 20:16
  • That helps me to know - much appreciated for checking. – BHope Mar 07 '23 at 20:19
  • 2
    Frequently, when someone's dplyr operations aren't working for them, but do work for others, it's because of a name conflict with the `plyr` package, which has many of the same function names. `plyr` is no longer actively managed but lives on in many online tutorials. https://stackoverflow.com/questions/26923862/why-are-my-dplyr-group-by-summarize-not-working-properly-name-collision-with – Jon Spring Mar 07 '23 at 20:39
  • 2
    When `mutate` or `summarize` "works" but doesn't respect groups it's almost always a name conflict with `plyr`. – Gregor Thomas Mar 07 '23 at 20:46

0 Answers0