The objective is to create a program capable of reading an Excel file and performing a linear correlation between two specific columns: the 4th column and a column labelled "tem." If the resulting R-squared value is below 0.8, the program should proceed to a column named "ctem" and remove the first value. Then, it should return to the 4th column and remove the last value. This ensures that both columns have an equal number of rows before conducting the linear correlation again. The program should repeat this process and compare the new R-squared value with the previous one. If the new value is larger, the program should continue by removing the first and last values and performing the correlation once more. However, if the new value is not larger, the program should stop.
When I run the program, I get the following error:
Error in `$<-.data.frame`(`*tmp*`, "Colum4", value = c(14.45, 14.44, 14.43, :
replacement has 5068 rows, data has 5069
Here's the code I'm using:
library(readr)
library(openxlsx)
# Step 1: Read the file
file_path <- "C://Users//hhernandez//OneDrive - Unitec NZ//Desktop//Cal R/pp.xlsx"
df <- read.xlsx(file_path)
# Step 2: Perform initial linear regression
X <- df[[4]]
y <- df$Ctem
reg <- lm(y ~ X)
r_squared <- summary(reg)$r.squared
# Step 3: Create and initialize the 'pepe' table
pepe <- data.frame(Equation = character(), `R-squared` = numeric())
# Step 4-8: Iterate until R-squared >= 0.8 or until R-squared stops increasing
while (r_squared < 0.8) {
# Step 4: Create 'papa' table with modified data
papa <- data.frame(Colum4 = df[[4]], Ctem = df$Ctem)
# Step 5: Remove first value from Ctem and shift cells up
papa <- papa[-1, ]
papa <- papa[1:(nrow(papa) - 1), ]
print(papa)
# Remove last value from Colum4
papa$Colum4 <- papa$Colum4[-nrow(papa)]
# Step 6: Perform linear regression on modified data and calculate R-squared
X <- papa$Colum4
y <- papa$Ctem
reg <- lm(y ~ X)
new_r_squared <- summary(reg)$r.squared
# Step 7: Append equation and R-squared to 'pepe' table
equation <- paste("y =", round(coef(reg)[2], 2), "x +", round(coef(reg)[1], 2))
pepe <- rbind(pepe, data.frame(Equation = equation, `R-squared` = new_r_squared))
# Step 8: Compare new R-squared with previous R-squared
if (new_r_squared > r_squared) {
r_squared <- new_r_squared
} else {
break # Stop iteration if R-squared stops increasing
}
}