-1

This is the code I used:

resources <- read.csv("https://raw.githubusercontent.com/umbertomig/intro-prob-stat-FGV/master/datasets/resources.csv")

res <- subset(resources, select = c("cty_name", "year", "regime",
                             "oil", "logGDPcp", "illit"))
resNoNA <- na.omit(res)
resNoNAS <- scale(resNoNA[, 3:6])
colMeans(resNoNA[, 3:6])
apply(resNoNA[, 3:6], 2, sd)
cluster2 <- kmeans(resNoNAS, centers = 2)
table(cluster2$cluster)
## this gives standardized answer, which is hard to interpret
cluster2$centers
## better to subset the original data and then compute means
g1 <- resNoNA[cluster2$cluster == 1, ]
colMeans(g1[, 3:6])
g2 <- resNoNA[cluster2$cluster == 2, ]
colMeans(g2[, 3:6])

plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita",  ylab = "Illiteracy", 
col = cluster2$cluster, cex = resNoNA$oil)

but I wanted to make the circles smaller in order to fit within the limits of the graph

enter image description here

user
  • 21
  • 3
  • Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (please be explicit about non-base packages), sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically after `set.seed(1)`), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. (With what you've provided ... who knows.) – r2evans Jun 22 '20 at 21:52
  • 2
    Perhaps this one helps you: https://stackoverflow.com/questions/2579995/control-the-size-of-points-in-an-r-scatterplot ? – Martin Gal Jun 22 '20 at 21:53

1 Answers1

1

You control the circle diameter with cex= here.

plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita",  ylab = "Illiteracy", 
col = cluster2$cluster, cex = resNoNA$oil)
plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita",  ylab = "Illiteracy", 
col = cluster2$cluster, cex = resNoNA$oil/3)
plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita",  ylab = "Illiteracy", 
col = cluster2$cluster, cex = resNoNA$oil/5)

3-pack of images

Realize, however, that if you are using this in some automated report generator (e.g., rmarkdown, shiny), then you may need to adjust the dimensions of the plot to control it from the other angle: update xlim and ylim.

r2evans
  • 141,215
  • 6
  • 77
  • 149