0

The objective is to create a program capable of reading an Excel file and performing a linear correlation between two specific columns: the 4th column and a column labelled "tem." If the resulting R-squared value is below 0.8, the program should proceed to a column named "ctem" and remove the first value. Then, it should return to the 4th column and remove the last value. This ensures that both columns have an equal number of rows before conducting the linear correlation again. The program should repeat this process and compare the new R-squared value with the previous one. If the new value is larger, the program should continue by removing the first and last values and performing the correlation once more. However, if the new value is not larger, the program should stop.

When I run the program, I get the following error:

Error in `$<-.data.frame`(`*tmp*`, "Colum4", value = c(14.45, 14.44, 14.43,  : 
  replacement has 5068 rows, data has 5069

Here's the code I'm using:

library(readr)
library(openxlsx)

# Step 1: Read the file
file_path <- "C://Users//hhernandez//OneDrive - Unitec NZ//Desktop//Cal R/pp.xlsx"
df <- read.xlsx(file_path)

# Step 2: Perform initial linear regression
X <- df[[4]]
y <- df$Ctem
reg <- lm(y ~ X)
r_squared <- summary(reg)$r.squared

# Step 3: Create and initialize the 'pepe' table
pepe <- data.frame(Equation = character(), `R-squared` = numeric())

# Step 4-8: Iterate until R-squared >= 0.8 or until R-squared stops increasing
while (r_squared < 0.8) {
  # Step 4: Create 'papa' table with modified data
  papa <- data.frame(Colum4 = df[[4]], Ctem = df$Ctem)
  
  # Step 5: Remove first value from Ctem and shift cells up

  papa <- papa[-1, ]
  papa <- papa[1:(nrow(papa) - 1), ]
  print(papa)

  # Remove last value from Colum4
  papa$Colum4 <- papa$Colum4[-nrow(papa)]
  
  # Step 6: Perform linear regression on modified data and calculate R-squared
  X <- papa$Colum4
  y <- papa$Ctem
  reg <- lm(y ~ X)
  new_r_squared <- summary(reg)$r.squared
  
  # Step 7: Append equation and R-squared to 'pepe' table
  equation <- paste("y =", round(coef(reg)[2], 2), "x +", round(coef(reg)[1], 2))
  pepe <- rbind(pepe, data.frame(Equation = equation, `R-squared` = new_r_squared))
  
  # Step 8: Compare new R-squared with previous R-squared
  if (new_r_squared > r_squared) {
    r_squared <- new_r_squared
  } else {
    break  # Stop iteration if R-squared stops increasing
  }
}
STerliakov
  • 4,983
  • 3
  • 15
  • 37
  • Hi Ger! Welcome to StackOverflow. Currently your example isn't reproducible because the data isn't included. Would you be able to include the data file/a subset of the data? `dput` might come in handy – Mark Jul 08 '23 at 05:54
  • https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Mark Jul 08 '23 at 05:54

1 Answers1

0

With papa$Colum4 <- papa$Colum4[-nrow(papa)] you are trying to combine columns with different lengths -- length of all other columns is still equal to nrow(papa).
Example to illustrate:

papa <- df <- as.data.frame(matrix(1:8, ncol = 2, dimnames = list(c(1:4),c("Colum4", "Ctem"))))
papa
#>   Colum4 Ctem
#> 1      1    5
#> 2      2    6
#> 3      3    7
#> 4      4    8

papa$Colum4 <- papa$Colum4[-nrow(papa)]
#> Error in `$<-.data.frame`(`*tmp*`, Colum4, value = 1:3): replacement has 3 rows, data has 4

# because we can't replace column of 4x2 dataframe with a vector of length 3:
str(papa$Colum4); str(papa$Colum4[-nrow(papa)])
#>  int [1:4] 1 2 3 4
#>  int [1:3] 1 2 3

As a side note, loop logic does not sound quite right. For example, each cycle starts by setting papa to the same data.frame(Colum4 = df[[4]], Ctem = df$Ctem), making it an infinite loop where r_squared never changes. And step 5 slices off the first and the last row from dataframe (i.e. both column vector lengths are reduced by 2), which seems a bit different than described intent.

Perhaps something like this instead for changing X and y vectors:

X <- df$Colum4
y <- df$Ctem

# dummy condition
while (length(X) > 1) {
  # str(list()) just for printing:
  str(list(X = X, y = y))
  X <- X[-1]
  y <- y[-length(y)]
  # reg <- lm(y ~ X)
  # ... 
}
#> List of 2
#>  $ X: int [1:4] 1 2 3 4
#>  $ y: int [1:4] 5 6 7 8
#> List of 2
#>  $ X: int [1:3] 2 3 4
#>  $ y: int [1:3] 5 6 7
#> List of 2
#>  $ X: int [1:2] 3 4
#>  $ y: int [1:2] 5 6
margusl
  • 7,804
  • 2
  • 16
  • 20