I have searched StackOverflow for answers to this question but am still struggling - apologies if this looks too much like a duplicate question.
I have a dataframe similar to this:
df <- data.frame(Cohort = c('con', 'con', 'dis', 'dis', 'con', 'dis'),
Sex = c('M', 'F', 'M', 'F', 'M', 'M'),
P1 = c(50, 40, 70, 80, 45, 75),
P2 = c(10, 9, 15, 13, 10, 8))
I want to perform a linear regression on all numeric columns of my dataframe using "Cohort" as the predictor (with the intent of adding features, such as "Sex", in future analysis).
I subset my dataframe to drop all irrelevant columns (in this toy example, Sex):
new_df <- df[,-c(Sex)]
Then I perform the regression like this:
fit <- lapply(new_df[-1], function(y){summary(lm(y ~ Cohort, data=new_df))})
When I test this on a small subset of my df (~5 columns) it works fine. In reality my df is ~7300 columns. When I run the command on the full dataframe I get this error:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'
I then assumed it was an issue with N/A values but when I do this I get back '0':
sum(is.na(new_df))
I have also tried the na.action=na.omit
but that did not help the error either.
My end goal is to perform these regressions and extract the p-value and r-squared values using anova(fit)$'Pr(>F)'
and summary(fit)$r.squared
, respectively.
How can I correct this error, or is there a better method to do this? Additionally, moving forward how can I perform this by not subsetting my dataframe when I add other features to the regression?
EDIT:
@Parfait A dput()
example of my df:
dput(new_data[1:4, 1:4])
structure(list(Cohort = c("Disease", "Disease", "Control", "Control"),
seq.10010.10 = c(8.33449676839042, 8.39959836912012, 8.34385193344212,
8.43546191447928), seq.10011.65 = c(11.5222872738433, 11.7652860987237,
11.1661630826461, 11.008848763327), seq.10012.5 = c(10.5414838640543,
10.6862378767518, 10.5408061105915, 10.726558779105)), class = c("soma_adat",
"data.frame"), row.names = c("258633854330_1", "258633854330_3",
"258633854330_5", "258633854330_6")