How to subset a data frame by id, with sampling 1 row by id? (in R)

Question

I have a big data frame and each row have an id code. But i want to create another data frame with only one row of each id. How can i do it?

This is one part of the data (the id column is "codigo_pon"):

Welcome to stack overflow. It's easier to help you if you make your question reproducible including data and your code which can be used to test and verify possible solutions. [Asking a good question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Peter, Sep 01 '21 at 13:44
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Sep 01 '21 at 13:48

score 0 · Answer 1 · answered Sep 01 '21 at 13:55

0

Using dplyr, you can do this:

library(dplyr)
your_data %>%
  group_by(id_column) %>%
  sample_n(1) %>%
  ungroup()

answered Sep 01 '21 at 13:55

Gregor Thomas

score 0 · Answer 2 · answered Sep 01 '21 at 13:55

Based on the question, you could do somethink like this:

library(tidyverse)

Example data

data <-
  tibble(
    id = rep(1:20,each = 5),
    value = rnorm(100)
  )

data %>% 
  #Group by id variable
  group_by(id) %>% 
  #Sample 1 row by id
  sample_n(size = 1)

r2evans · Answer 3 · 2021-09-01T14:14:20.190

data[!ave(seq_len(nrow(data)), data$codigo_pon,
          FUN = function(z) seq_along(z) != sample(length(z), size = 1)),]

or

do.call(rbind, by(data, data$codigo_pon,
                  FUN = function(z) z[sample(nrow(z), size = 1),]))

(Previously I suggested aggregate, but that sampled each column separately, breaking up the rows.)

library(data.table)
as.data.table(data)[, .SD[sample(.N, size = 1),], by = codigo_pon]

(dplyr has already been demonstrated twice)