how to extract certain rows from data in R?

Question

I have a set of data and I need to sample it. Part of data is like below:

row.names  customer_ID
1           10000000
2           10000000
3           10000000    
4           10000000
5           10000005
6           10000005
7           10000008
8           10000008
9           10000008
10          10000008
11          10000008
12          10000008
...

take the first 2 rows from each customer then before including the next row do a check: there is a 65% chance we take the next row and 35% chance we quit and move to the next customer. If we take the row, we do it again 65% and 35% until we run out of data for the customer or we are fail the check and move to the next customer anyway. Repeat this for each customer

The classical question, what have you tried and what where the problems you ran into? — Hidde, Apr 12 '14 at 15:24
Welcome on SO: Please read: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 — sgibb, Apr 12 '14 at 15:26

score 1 · Answer 1 · answered Apr 12 '14 at 15:58

The process for determining how many rows to take from a customer is basically a negative binomial distribution. Assuming your data's stored in dat:

# Split your data by customer id
spl <- split(dat, dat$customer_ID)

# Grab the correct number of rows from each customer
set.seed(144)
spl <- lapply(spl, function(x) x[seq(min(nrow(x), 2+rnbinom(1, 1, 0.35))),])

# Combine into a final data frame
do.call(rbind, spl)
#            row.names customer_ID
# 10000000.1         1    10000000
# 10000000.2         2    10000000
# 10000000.3         3    10000000
# 10000000.4         4    10000000
# 10000005.5         5    10000005
# 10000005.6         6    10000005
# 10000008.7         7    10000008
# 10000008.8         8    10000008
# 10000008.9         9    10000008

Cool idea, I am trying to see whether the result varies as I expected — user3525943, Apr 13 '14 at 16:35

how to extract certain rows from data in R?

1 Answers1