0

I have a big data frame and each row have an id code. But i want to create another data frame with only one row of each id. How can i do it?

This is one part of the data (the id column is "codigo_pon"):

  • 2
    Welcome to stack overflow. It's easier to help you if you make your question reproducible including data and your code which can be used to test and verify possible solutions. [Asking a good question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Peter Sep 01 '21 at 13:44
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Sep 01 '21 at 13:48

3 Answers3

0

Using dplyr, you can do this:

library(dplyr)
your_data %>%
  group_by(id_column) %>%
  sample_n(1) %>%
  ungroup()
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
0

Based on the question, you could do somethink like this:

library(tidyverse)

Example data

data <-
  tibble(
    id = rep(1:20,each = 5),
    value = rnorm(100)
  )

Sample data, 1 row by id

data %>% 
  #Group by id variable
  group_by(id) %>% 
  #Sample 1 row by id
  sample_n(size = 1)
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32
0

base R

data[!ave(seq_len(nrow(data)), data$codigo_pon,
          FUN = function(z) seq_along(z) != sample(length(z), size = 1)),]

or

do.call(rbind, by(data, data$codigo_pon,
                  FUN = function(z) z[sample(nrow(z), size = 1),]))

(Previously I suggested aggregate, but that sampled each column separately, breaking up the rows.)

data.table

library(data.table)
as.data.table(data)[, .SD[sample(.N, size = 1),], by = codigo_pon]

(dplyr has already been demonstrated twice)

r2evans
  • 141,215
  • 6
  • 77
  • 149