Edits - I have removed previous writing, making way to show my edits and if you can help me out
I want to begin a for loop that will go through all the 332 cases of the Directory, pick out the nitrate or sulphate values and take the mean from these values.
I have figured how to do this individually, however, this will take a lot of writing with this method. How can I implement this into a for loop? please just point me into the direction, without giving the full answer.
specdata <- list.files(getwd(), pattern="*.csv")
directory <- lapply(specdata, read.csv)
name_1 <- get("nitrate", envir = as.environment(directory[[1]]))
name_2 <- na.omit(name_1)
name_3 <- name_2[1:122]
pollutantmean <- function(directory, pollutant, id = 1:332) {
for( ?) {
???
}
??????
}
I have gone through a different method. This involved removing the selected columns (Sulphate and Date), leaving only nitrate and ID. I then omitted the NA values, and now the ID counts each nitrate value for the 332 cases. My next step is deciding how I am going to select ID by integer value and not by row. for example, if I print(final_df$ID[1:32])
it only sends back the integer values of the first 32 rows, rather than the first 32 cases, i.e. 1, 2, 3 ... 32 (previosly, it was 1, 1, 1 ... 1 as the list is large and and the first 1000 are 1s, 2000s are 2s and so forth, these are not exact)
By doing so, I can then select the nitrate values(numeric) by each ID value(Integer), and find the mean between these values. How would I go about doing this?
The data is something like this
Data Sulphate Nitrate ID
10/10/10 0.576 0.784 1
10/10/10 0.738 0.687 1
. . . .
. . . .
11/11/11 0.954 1.093 2
. . . .
. . . .
. . . .
13/13/13 0.495 0.586 332
final_df$date <- NULL
final_df$Sulphate <- NULL
So far the code looks like this
specdata <- list.files(getwd(), pattern="*.csv")
directory <- lapply(specdata, read.csv)
directory_final <- do.call(rbind, directory)
one <- select(directory, nitrate:ID) a <- select(directory, sulfate, ID)
two <- na.omit(two) b <- na.omit(a)
three <- filter(two, ID %in% 1:30) c <- filter(b, ID %in% 1:30)
four <- mean(two$nitrate) d <- mean(c$sulfate)
It works in the way that it can extract the values I may need, however, it is very impractical in the long run. I have had to create 8 pieces of code to retrieve the mean of the list of integers belonging to either sulfate or nitrate. And if I want another set of values I would then have to go back to three & c, to change these values and then repeat four & d. I will be working on how to incorporate these into one list that can extract the mean from these integer values for either sulfate or nitrate in one code. I do expect that creating a function will be needed, so any tips are appreciated!