I am trying to write a function that replaces the missing values of selected variables in a data frame with their lagged values (I am using a one obs. lag) in R. I have successfully written the following for loop to do this:
testdata <- data.frame(x1 = c(1:10),
x2 = c(4, 3, NA, 7, 8, NA, 9, NA, 10, 11),
x3 = c(4, 3, NA, 7, 8, NA, 9, NA, NA, 11),
x4 = c("a", NA, NA, "d", "e", NA, "f", NA, "g", NA))
for (j in 2:4){
for (i in 1:10){
if(is.na(testdata[i, j])){
testdata[i, j] <- testdata[i - 1, j]
}}}
The for loop works fine, however will I generalize this code and write it in a function the function create an empty list. The function that I have written is as follows:
fill_null <- function(df, columns, rows){
for (j in columns){
for(i in rows){
if(is.na(df[i, j])){
df[i,j] <- df[i - 1, j]
} else{
df[i, j] <- df[i, j]
}}}}
When I run this function using the following code:
newdf <- fill_null(testdata, 2:4, 1:10)
str(newdf)
I get the following output:
> str(newdf)
NULL
I am wondering why this for loop will work when it is not called in a function but stops working once it is written into a function. I am also wondering if there is an easy way to fix this issue because I have to fill NA with lagged values for several different data frames.