The main problem with your code is that you don't use curly brackets to put multiple statements inside loop. From R point of view, only first line (df_temp <- df[df$club == i, ]
) is evaluated inside loop. Rest of the program - including actually writing content to file - is done only after loop has ended. Because variables created inside loop will be added to global environment and available outside of the loop, no errors are raised. But, effectively, your file writing code is executed only for last iteration of loop.
Fix for this issue is trivial:
set.seed(123)
l <- data.frame(club=sample(LETTERS[1:10], 286, TRUE),
visitors=as.integer(runif(286, 100, 1000))
)
split_csv <- function(df, list) {
setwd("dir")
for (i in list) {
#print(i)
df_temp <- df[df$club == i, ]
name <- paste0("club_", i, ".csv")
write.csv(df_temp, name)
}
setwd("..")
}
split_csv(l, LETTERS[1:3])
list.files("dir/")
# [1] "club_A.csv" "club_B.csv" "club_C.csv"
But let's use your question as opportunity to see how this code can be improved.
by
function can be used to split data.frame into subsets with identical values in given factor (or factors, but let's keep it simple). You can run any function - including custom (and anonymous) one - on that subset.
split_csv2 <- function(df, list) {
by(df, df$club, function(x) {
# `x` is subset of df with one value in `club`
# assign current "club" value for further reference
i <- x[1, "club"]
# don't do anything else if current club is not in list of allowed clubs
if (! i %in% list) return()
name <- paste0("dir/club_", i, ".csv")
write.csv(x, name)
}
)
}
invisible(split_csv2(l, LETTERS[2:4])) # discard output - it's not helpful anyway
list.files("dir/")
# [1] "club_B.csv" "club_C.csv" "club_D.csv"
There are two main advantages of this approach:
- We no longer compare entire column of data frame against some value in each loop iteration, making it significantly faster. Of course with data frame with this order of magnitude there is no way to notice any difference. But one day you might want to perform similar operation for much bigger data set.
- Loops are generally frowned upon in R community[citation needed]. Thanks to apply family of functions, they are rarely required. Familiarizing yourself with these functions is one of the most important steps on journey to master R.
Additionally:
- Inside your function, your second argument will shadow over
list
function that is used to create lists, one of basic R data structures. In more complex cases this could lead to unexpected behaviors and hard to debug issues. Better avoid that at all.
- This is highly subjective, but many developers would tell you that changing directory inside function is not good practice.