0

I need to develop a script to run a simple OLS on multiple csv files stored in the same folder. All have the same column names and regression will always be based upon the same columns ("x_var" and "y_var").

The below code is used to read in the csvs and rename them.

## Read in files from folder
file.List <- list.files(pattern = "*.csv")
for(i in 1:length(file.List))
{
  assign(paste(gsub(".csv","", file.List[i])), read.csv(file.List[i]))
}

However, after this [very initial stage!] I've got a bit lost........

Each dataframe has 7 identical columns. a, b, c, d, x_var, e, y_var..... I need to run a simple OLS using lm(x_car ~ y_var, data = dataframes) and plot the result on each dataframe and assumed a 'for loop' would be the best option, but am not too sure of how to do so....

After each regression is run I want it to extract the coefficients/R2 etc into a csv and save the plot separately.......

Tried below, but have gone very wrong [and not working at all];

list <- list(gsub(" finalSIRTAnalysis.csv","", file.List))
for(i in length(file.List))
{
lm(x_var ~ y_var, data = [i])
}

Can't even make a start on this........and need some advice, if anyone has any good ideas (such as creating an external function first.....)

j.rahilly
  • 33
  • 5

1 Answers1

0

I am not sure if the function lm is available to compute the results using multiple variable sources. Try merging the database. I have have a similar issue because I have 5k files and is computationally impossible to merge them all. But maybe this answer can help you. https://stackoverflow.com/a/63770065/14744492

pinpss
  • 69
  • 1
  • 7