Sample random rows in dataframe, where number of samples exceeds number of rows. Assign sampling probability

Question

Consider the following example data, stored in a dataframe called df

As you can see, there are 3 rows to this dataframe. What I'd like to do is take 100 row samples, where each row has an equal probability of being selecting (in this case 1/3). My output, let's call it df_result would look something like this:

df_result
x  y
0  8
2  4
0  8
1  5
1  5
2  4

etc..... until 100 samples are taken.

I saw this previous stackoverflow post which detailed how to take random samples for a dataframe: df[sample(nrow(df), 3), ]

However, when I tried to sample 100 rows, this (predictably) did not work, and did not allow for the sampling probability to be assigned.

Any tips?

Thanks`

@HubertL Thanks. When I try to set the prob=c(rep(1/3,3)) argument in the sample function I get the error: "incorrect number of probabilities". Does the sample function automatically assign equal weights? — lecreprays, May 19 '17 at 02:22
I'm not sure why... it works with `df[sample(3,100,replace=TRUE,prob=c(rep(1/3,3))),]` — HubertL, May 19 '17 at 02:27
`modelr::resample` (e.g. `modelr::resample(df, sample(nrow(df), 100, replace = TRUE))`) is good for this at scale, as it just stores a pointer and the indices instead of redundant data. To expand it to a data.frame pass it to `as.data.frame`, though models can handle it directly. — alistaire, May 19 '17 at 04:05

score 0 · Answer 1 · answered May 19 '17 at 07:11

0

df <- read.table(header = TRUE,
                text = "x  y
2  4
1  5
0  8")

set.seed(1)
df[sample(nrow(df), 10, replace=T), ]

    x y
1   2 4
2   1 5
2.1 1 5
3   0 8
1.1 2 4
3.1 0 8
3.2 0 8
2.2 1 5
2.3 1 5
1.2 2 4

answered May 19 '17 at 07:11

Jeppe Olsen

968
8
19

Sample random rows in dataframe, where number of samples exceeds number of rows. Assign sampling probability

1 Answers1