-1

I know I am probably missing something so simple but I have great difficulty with the whole data wrangling thing! I want to create a table such as below:

enter image description here

I have a data frame that has multiple different locations, parameters, and values of concentration for each parameter. The thing I'm confused about is how to calculate the average of each parameter because in the data frame I have, the parameters are all listed in a column and then the values are in a different column.. How can I accomplish this? I would appreciate any help/guidance.

dput of data:

 dput(head(df_trib_filtered,10))
structure(list(NJPDES = c("NJ0020206", "NJ0020532", "NJ0022021", 
"NJ0022985", "NJ0023361", "NJ0023736", "NJ0024015", "NJ0024031", 
"NJ0024040", "NJ0024678"), Facility_Name = c("ALLENTOWN BORO WWTP", 
"HARRISON TWP MULLICA HILL WWTP", "SWEDESBORO WTP", "WRIGHTSTOWN BOROUGH STP", 
"WILLINGBORO WATER POLLUTION CONTROL PLANT", "PINELANDS WASTEWATER CO", 
"MOUNT HOLLY WPCF", "ELMWOOD WTP", "WOODSTREAM STP", "BORDENTOWN SA BLACK'S CREEK STP"
), `Monitored Location Designator` = c("001A", "001A", "001A", 
"001A", "001A", "001A", "001A", "001A", "001A", "001A"), Date = structure(c(17378, 
17378, 17378, 17378, 17378, 17378, 17378, 17378, 17378, 17378
), class = "Date"), Parameter_Number_DMR = c("00300", "00300", 
"00300", "00300", "00300", "00300", "00300", "00300", "00300", 
"00300"), Parameter = c("Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", 
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", 
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", 
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)"), `Sample Point Description` = c("Effluent Gross Value", 
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value", 
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value", 
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value"
), Rep_Val_Quantity_Avg = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA), X__1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `Reported Value Quantity Maximum` = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), `Quantity Units Description` = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), Rep_Val_Con_Min = c("7.44", 
NA, "6.07", NA, NA, "6.6", NA, "6.5", NA, "6.89"), Val_Con_AVG = c(7.58, 
7, 6.09, 6.9, 7.58, 6.5, 7.9, 6.5, 6.8, 6.99), Rep_Val_Con_Max = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
    valunit = c("MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", 
    "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", 
    "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", 
    "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER")), .Names = c("NJPDES", 
"Facility_Name", "Monitored Location Designator", "Date", "Parameter_Number_DMR", 
"Parameter", "Sample Point Description", "Rep_Val_Quantity_Avg", 
"X__1", "Reported Value Quantity Maximum", "Quantity Units Description", 
"Rep_Val_Con_Min", "Val_Con_AVG", "Rep_Val_Con_Max", "valunit"
), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

Code I have so far:

### Get dataframe to have parameters in their own column ###
data_tidy_trib <- df_trib_filtered %>%
  spread(Parameter,Val_Con_AVG)

Do I have the right idea of using a spread to accomplish what I want? When I did the spread I got a lot of NAs in each of the new columns with the parameters... So my guess is I am doing it wrong?

NBE
  • 641
  • 2
  • 11
  • 33

1 Answers1

1

based on the comments you first need to run

detach(plyr)

than use:

library(dplyr)
df_trib_filtered %>%
  dplyr::group_by(Facility_Name, Parameter) %>% 
  dplyr::summarise(Average = mean(Val_Con_AVG, na.rm = TRUE))

output would be

# A tibble: 10 x 3
# Groups:   Facility_Name [?]
#    Facility_Name                             Parameter              Average
#    <chr>                                     <chr>                    <dbl>
#  1 ALLENTOWN BORO WWTP                       Oxygen, Dissolved (DO)    7.58
#  2 BORDENTOWN SA BLACK'S CREEK STP           Oxygen, Dissolved (DO)    6.99
#  3 ELMWOOD WTP                               Oxygen, Dissolved (DO)    6.5 
#  4 HARRISON TWP MULLICA HILL WWTP            Oxygen, Dissolved (DO)    7   
#  5 MOUNT HOLLY WPCF                          Oxygen, Dissolved (DO)    7.9 
#  6 PINELANDS WASTEWATER CO                   Oxygen, Dissolved (DO)    6.5 
#  7 SWEDESBORO WTP                            Oxygen, Dissolved (DO)    6.09
#  8 WILLINGBORO WATER POLLUTION CONTROL PLANT Oxygen, Dissolved (DO)    7.58
#  9 WOODSTREAM STP                            Oxygen, Dissolved (DO)    6.8 
# 10 WRIGHTSTOWN BOROUGH STP                   Oxygen, Dissolved (DO)    6.9 
divibisan
  • 11,659
  • 11
  • 40
  • 58
Stephan
  • 2,056
  • 1
  • 9
  • 20
  • Thanks for your answer. When I do this, I get a data frame with 1 obs. and 1 variable .. Not sure how you got the table you did? – NBE Jul 31 '18 at 15:49
  • 1
    add `dplyr::` before `summarise(...` – Stephan Jul 31 '18 at 15:58
  • Worked! Thanks ! @Stephan – NBE Jul 31 '18 at 16:00
  • Hey sorry to ask this in comment but I don't want to get down voted for asking another question.. Is there a way to get the mean of two columns while using the above approach? I have some parameters where their values are recorded in another column but not in the one I originally took the mean on – NBE Aug 01 '18 at 15:24