3

I have a scatter plot which I generate using below code

set.seed(10)
mydata <- data.frame(x1 = rnorm(1000), x2 = rnorm(1000))

ind <- replicate(3, sample(nrow(mydata), 500))
head(ind)

feature1 = mydata[ind[,1], "x1"]
feature2 = mydata[ind[,2], "x2"]

# start with a plot
plot(feature1, feature2, pch=4 , col="black")

enter image description here

I want to identify one data point and color it using a different color, which I do using below code

plot(feature1, feature2, pch=4, col=ifelse((feature1 > 2.6 & feature1 < 2.7 ), "red", "black"))

enter image description here

Now, I would like to draw a circle around this point(which is marked in RED) and connect nearest neighboring N points to this point(where N should be a variable)

How can I do it using R?

Here is what I intend to get in my output

enter image description here

Regressor
  • 1,843
  • 4
  • 27
  • 67

2 Answers2

1

Here's a way to do it with base plotting functions but using spDistsN1() from the sp library which should run quickly for very large numbers of points.

edit: I removed dependence on plotrix library for circle drawing, which was giving an incorrect result.

draw_neighbors <- function(dat, focal_pt_index, n) {
  require(sp)

  # Calculate distances to focal point.
  dists <- spDistsN1(pts = dat, pt = dat[focal_pt_index,])

  # Sort points by distance.
  dat <- cbind(dat, dist = dists)
  dat <- dat[order(dat[,'dist']), ]

  # Plot points
  plot(dat[,1], dat[,2], pch=4 , col=ifelse(dat[,'dist'] == 0, "red", "black"), asp = 1)

  # Draw a line to each neighbor
  neighbors <- dat[2:(n+1), ]
  for (i in 1:nrow(neighbors)) {
    lines(x = c(dat[1,1], neighbors[i,1]), y = c(dat[1,2], neighbors[i,2]), col = 'red')
  }

  # Draw a circle at the radius equal to the largest distance within the n nearest neighbors.
  radius <- dat[n+1, 'dist']
  angles <- seq(0,2*pi,length=1000)
  coords <- cbind(dat[1,1] + sin(angles) * radius, dat[1,2] + cos(angles)* radius)
  points(coords, type = 'l', lty = 2, col = 'red')

}

Here is what you get using your data for n = 10.

Call:
draw_neighbors(dat = cbind(feature1, feature2), focal_pt_index = which(feature1 > 2.6 & feature1 < 2.7), n = 10)

enter image description here

qdread
  • 3,389
  • 19
  • 36
1

Let's first put your data into a matrix p, determine your point of interest p0, and define the number of common neighbours of interest k.

p <- cbind(feature1, feature2)
idx <- p[, 1] > 2.6 & p[, 1] < 2.7
p0 <- p[idx, ]
k <- 10
plot(feature1, feature2, pch = 4, col = ifelse(idx, "red", "black"))

Then we find those k nearest neighbours and draw a circle (using circleFun from this answer) and segments:

kNN <- p[order(colMeans((t(p) - p0)^2))[1 + 1:k], ]
crc <- circleFun(p0, diameter = 2 * sqrt(sum((kNN[k, ] - p0)^2)))
lines(x = crc$x, y = crc$y, col = 'red', lty = 2)
segments(x0 = p0[1], y0 = p0[2], x1 = kNN[, 1], y1 = kNN[, 2], col = "red")

enter image description here

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • can u explain how got the KNN object? I did not understand `p[order(colMeans((t(p) - p0)^2))[1 + 1:k], ]` part – Regressor Dec 05 '18 at 17:47
  • Moreover, I can see some points that lie within the circle but are not connected to the point of interest.(I am guessing that the distance is higher for those points). But if the distance is greater then should they lie outside the circle ? – Regressor Dec 05 '18 at 17:51
  • @Regressor, sure. There are other ways to achieve that of course. `p` is a matrix and `p0` is just a vector so, as to do things in a vectorized way, I transposed `p` because then `t(p)-p0` leads to differences: each point minus `p0`. Squaring and doing `colMeans` (or `colSums` more precisely) gives squared distances between `p0` and every other point. Then applying `order` to this vector says which element from this distances vector is the smallest, second smallest, and so on. Then, rather than taking `1:k` element, I took `2:(k + 1)` because clearly the first will be `p0` itself (0 distance). – Julius Vainora Dec 05 '18 at 17:51
  • @Regressor, it looks like draw.circle is flawed... It takes the radius in terms of "user units". See the update. – Julius Vainora Dec 05 '18 at 18:20