0

I ran an linear regression with the following commands:

lm.intp <- lm(intp.trust~age,data=Scountry)

And then i hope to draw a scatterplot to compare the residuals of different genders. i use the Scountry$res <- lm.intp$residuals commands to put the regression residuals into the data frame, and then use ggplot to draw the scatterplot. But when i run Scountry$res <- lm.intp$residuals, it keep saying the existing data and assigned data have different rows. how can i avoid this situation?

And when i draw the scatterplot, i hope to use the following commands:

ggplot(Scountry, aes(x=, y=res, color=as.factor(gender))) +geom_point()

I know that in this plot, y should be the residuals, and the x should be the observations of this data, but i really have no idea what should be wrote in "x=" since the observations in my data have no ID, it's look like this:enter image description here

Could anyone please help me solve this questions? i'd be really appreciated!

Xingchen LIU
  • 91
  • 1
  • 4
  • Can you please provide data. Or at least a few rows? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Bloxx Oct 16 '21 at 22:57
  • Pictures of data are not useful because they do not include important information about the data types of the variables. Your picture does not even include the variables you are using in the analysis. – dcarlson Oct 16 '21 at 23:07
  • I'm sorry but the data frame has more than 400 variables. should i just include variables age, intp.trust and gender which are used here, and put some rows ? – Xingchen LIU Oct 16 '21 at 23:08

1 Answers1

1

Hmm... It works here. I created data the fit your description. Do you maybe have NAs in your dataset?

Scountry <- data.frame(intp.trust = seq(200, 205),
                       age = seq(20,25),
                       gender= c("F", "M", "F", "M", "F", NA))

Scountry_lm <- Scountry %>% select(intp.trust, age, gender) %>% na.omit()


lm.intp <- lm(intp.trust~age,data=Scountry_lm)

Scountry_lm$res <- lm.intp$residuals
ggplot(Scountry_lm, aes(x= age,y=res, color=as.factor(gender))) +geom_point()
Bloxx
  • 1,495
  • 1
  • 9
  • 21
  • yes, I guess there are some missing values in my data, so the number of residuals is less than the number of observations. is there any way to deal with the missing values? – Xingchen LIU Oct 16 '21 at 23:12
  • Sure. If you use dplyr() package you can write Scountry <- Scountry %>% na.omit() This will remove all the rows that contains NA in any column.. so, I rather suggest: Scountry_lm <- Scountry %>% select(intp.trust, age, gender) %>% na.omit() But then in the code for linear regression use new data frame – Bloxx Oct 16 '21 at 23:18
  • I edited the answer. Now I put NA in the original dataset, which is later removed in a new dataset! – Bloxx Oct 16 '21 at 23:22
  • i have one more question, in your ggplot function, you use the 'age' as the x of the scatterplot. If i want to use every single observations as the x of the scatterplot to show the residuals of every observations , what should i do? – Xingchen LIU Oct 16 '21 at 23:35
  • @XingchenLIU ; add `na.action = na.exclude` to your `lm` model and you will get residuals the same length as your data but with NA's in the relevant positions – user20650 Oct 16 '21 at 23:48
  • @user20650,it seems doesn't work. I run 'lm.intp <- lm(intp.trust~age,data=Scountry,na.action = na.exclude)',and it still has this error message 'Error: Assigned data `lm.intp$residuals` must be compatible with existing data. x Existing data has 10076 rows. x Assigned data has 9773 rows. i Only vectors of size 1 are recycled.' – Xingchen LIU Oct 17 '21 at 00:07
  • don't use `lm.intp$residuals` but use the extractor function `resid(lm.intp)` – user20650 Oct 17 '21 at 00:56