I know I am probably missing something so simple but I have great difficulty with the whole data wrangling thing! I want to create a table such as below:
I have a data frame that has multiple different locations, parameters, and values of concentration for each parameter. The thing I'm confused about is how to calculate the average of each parameter because in the data frame I have, the parameters are all listed in a column and then the values are in a different column.. How can I accomplish this? I would appreciate any help/guidance.
dput of data:
dput(head(df_trib_filtered,10))
structure(list(NJPDES = c("NJ0020206", "NJ0020532", "NJ0022021",
"NJ0022985", "NJ0023361", "NJ0023736", "NJ0024015", "NJ0024031",
"NJ0024040", "NJ0024678"), Facility_Name = c("ALLENTOWN BORO WWTP",
"HARRISON TWP MULLICA HILL WWTP", "SWEDESBORO WTP", "WRIGHTSTOWN BOROUGH STP",
"WILLINGBORO WATER POLLUTION CONTROL PLANT", "PINELANDS WASTEWATER CO",
"MOUNT HOLLY WPCF", "ELMWOOD WTP", "WOODSTREAM STP", "BORDENTOWN SA BLACK'S CREEK STP"
), `Monitored Location Designator` = c("001A", "001A", "001A",
"001A", "001A", "001A", "001A", "001A", "001A", "001A"), Date = structure(c(17378,
17378, 17378, 17378, 17378, 17378, 17378, 17378, 17378, 17378
), class = "Date"), Parameter_Number_DMR = c("00300", "00300",
"00300", "00300", "00300", "00300", "00300", "00300", "00300",
"00300"), Parameter = c("Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)",
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)",
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)",
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)"), `Sample Point Description` = c("Effluent Gross Value",
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value",
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value",
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value"
), Rep_Val_Quantity_Avg = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), X__1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `Reported Value Quantity Maximum` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), `Quantity Units Description` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), Rep_Val_Con_Min = c("7.44",
NA, "6.07", NA, NA, "6.6", NA, "6.5", NA, "6.89"), Val_Con_AVG = c(7.58,
7, 6.09, 6.9, 7.58, 6.5, 7.9, 6.5, 6.8, 6.99), Rep_Val_Con_Max = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
valunit = c("MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER",
"MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER",
"MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER",
"MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER")), .Names = c("NJPDES",
"Facility_Name", "Monitored Location Designator", "Date", "Parameter_Number_DMR",
"Parameter", "Sample Point Description", "Rep_Val_Quantity_Avg",
"X__1", "Reported Value Quantity Maximum", "Quantity Units Description",
"Rep_Val_Con_Min", "Val_Con_AVG", "Rep_Val_Con_Max", "valunit"
), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
Code I have so far:
### Get dataframe to have parameters in their own column ###
data_tidy_trib <- df_trib_filtered %>%
spread(Parameter,Val_Con_AVG)
Do I have the right idea of using a spread to accomplish what I want? When I did the spread I got a lot of NAs in each of the new columns with the parameters... So my guess is I am doing it wrong?