One way to get the run time down is to play with smaller datasets and different code to see which would be faster.
Then you can use system.time()
to see how long something takes and compare:
Measuring function execution time in R
For example:
size<-100000
IBD<-data.frame(X=rbeta(n = size,shape1=2,shape2 = 2),Y=rbeta(n = size,shape1=2,shape2 = 2))
Using your code on this fake dataset:
system.time(
ggplot(IBD, aes(x=X, y=Y))+ geom_point() + ggtitle("ADGC EOAD") + scale_x_continuous(limits=c(0,1)) + scale_y_continuous(limits=c(0,1))
)
user system elapsed
0.01 0.00 0.01
Using base plot
as a comparison point:
system.time(
plot(Y~X, data=IBD)
)
user system elapsed
2.13 2.34 4.56
You can see that plot
takes a lot longer. I realize this isn't a solution to making your code faster, but it is a tool that you can use to figure out what would be faster on such a large dataset.
Edit:
Adding in the methods from comments by @maydin:
cluster<-kmeans(x = IBD, centers = 1000)
Clus<-data.frame(cluster$centers)
system.time(
ggplot(Clus, aes(x=X, y=Y))+ geom_point() + ggtitle("ADGC EOAD") + scale_x_continuous(limits=c(0,1)) + scale_y_continuous(limits=c(0,1))
)
user system elapsed
0 0 0