I'm having trouble following along with an example provided by my professor. We're meant to follow along provided examples to understand the code and how the implementation goes and then do a different assignment based on topics covered in examples.
I'm having problems implementing a Scatter plot on the example. The code uses the Adult dataset from the UCI machine learning repository and has the following code.
#install.packages("ggplot2")
library(ggplot2)
#import data
adult = read.csv("adult.DATA", header = FALSE, stringsAsFactors = TRUE)
summary(adult)
colnames(adult)
#remove similar columns and rename
adult_trim = adult[,-c(3,4,11,12)]
names(adult_trim) <- c("Age", "WorkClass", "Education", "Marital.Status", "Occupation", "Relationship", "Race",
"Sex", "Hours.per.Week", "Native.Country", "Income")
#remove empty values & Race/NativeCountry
adult_trim <- adult_trim[rowSums(adult_trim == "?") ==0, -c(7,10), drop = FALSE]
The problem is in the following scatterplot. The data doesnt have any header values for column names so it imports as v1,v2,... etc.
adult$V4 = as.factor(as.character(adult$V4))
levels(adult$V4)
plot(
jitter(as.numeric(adult$V4),0.5) ~ jitter(as.numeric(adult$V4), 0.5),
data = adult_trim,
xlab = "Income",
ylab = "Education",
pch = 19,
cex = 1,
bty = "n",
xlim = c(1:2),
col = rgb(180,0,180,30, maxColorValue = 255)
)
When trying to implement this plot on my machine it just gives me an error.
Warning message:
In plot.formula(jitter(as.numeric(adult$V4), 0.5) ~ jitter(as.numeric(adult$V4), :
c("the formula 'jitter(as.numeric(adult$V4), 0.5) ~ jitter(as.numeric(adult$V4), '
is treated as 'jitter(as.numeric(adult$V4), 0.5) ~ 1'", "the formula ' 0.5)'
is treated as 'jitter(as.numeric(adult$V4), 0.5) ~ 1'")
its supposed to look like this graph but with education https://i.stack.imgur.com/EPfhX.png but I'm just getting the error. Also is there any reason this decides to use the original "adult" instead of "adult_trim" ?
Any help or explanation would be appreciated.