1

I have a dataframe of x and y geographic coordinates (30,000+ coordinates) that look like the example matrix points below. I want to take a random sample of these but such that I don't lose the pairs of x and y coordinates.

For example, I know that I can get a random sample of say 2 of the items in x and y, but how do I get a random sample so that items that go together are preserved? In other words, in my matrix of points, one actual point is a pair of an x coordinate (for example, the first item: -12.89) that goes with the first item in the y list: 18.275.

Is there a way that I could put together the items in x and y such that the order is preserved in a tuple-like object (I'm more of a python user) and then take a random sample using sample()? Thanks.

# Make some pretend data
x<-c(-12.89,-15.35,-15.46,-41.17,45.32)
y<-c(18.275,11.370,18.342,18.305,18.301)
points<-cbind(x,y)
points

# Get a random sample:
# This is wrong because the x and y need to be considered together
c(sample(x, 2),
  sample(y, 2))

# This is also wrong because it treats each item in `points` separately
sample(points, size=2, replace=FALSE)

Ultimately, in this example, I would want to end up with two random pairs that go together. For example: (-15.35,11.370) and (45.32,18.301)

JAG2024
  • 3,987
  • 7
  • 29
  • 58
  • 2
    Just sample the index number `idx <- sample(1:length(x), size=1)` and use this, e.g., `x[idx]` and `y[idx]` – DanY Feb 10 '21 at 19:05

2 Answers2

5

You can take a sample from the row index:

set.seed(42)
points[sample(seq_len(nrow(points)), 2), ]

Gives

#          x      y
#[1,] -12.89 18.275
#[2,]  45.32 18.301
markus
  • 25,843
  • 5
  • 39
  • 58
  • Thanks. Why is the `set.seed()` included? – JAG2024 Feb 10 '21 at 19:11
  • 3
    `set.seed()` just ensures that psuedo-randomness is repeatable. See, e.g. [this link](http://rfunction.com/archives/62). Alternatively, run this code and look carefully at the output: `set.seed(1); rnorm(5); rnorm(5); set.seed(1); rnorm(5)`. – DanY Feb 10 '21 at 19:20
  • So to be clear @JAG2024, don't use set.seed() in your own code, it's there so se all get the result printed above when running this code. – moodymudskipper Feb 14 '21 at 10:02
1

Another option could be:

set.seed(123)
do.call(`rbind`, sample(asplit(points, 1), 2))

          x      y
[1,] -15.35 11.370
[2,] -41.17 18.305
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • Can you explain why you used `set.seed(123)` whereas Markus used `set.seed(42)`? – JAG2024 Feb 10 '21 at 19:18
  • 2
    It's to obtain the same results when you re-run it repeatedly. Any number could be used. See https://stackoverflow.com/questions/13605271/reasons-for-using-the-set-seed-function – tmfmnk Feb 10 '21 at 19:20