I have a big data frame and each row have an id code. But i want to create another data frame with only one row of each id. How can i do it?
This is one part of the data (the id column is "codigo_pon"):
I have a big data frame and each row have an id code. But i want to create another data frame with only one row of each id. How can i do it?
This is one part of the data (the id column is "codigo_pon"):
Using dplyr
, you can do this:
library(dplyr)
your_data %>%
group_by(id_column) %>%
sample_n(1) %>%
ungroup()
Based on the question, you could do somethink like this:
library(tidyverse)
data <-
tibble(
id = rep(1:20,each = 5),
value = rnorm(100)
)
data %>%
#Group by id variable
group_by(id) %>%
#Sample 1 row by id
sample_n(size = 1)
data[!ave(seq_len(nrow(data)), data$codigo_pon,
FUN = function(z) seq_along(z) != sample(length(z), size = 1)),]
or
do.call(rbind, by(data, data$codigo_pon,
FUN = function(z) z[sample(nrow(z), size = 1),]))
(Previously I suggested aggregate
, but that sampled each column separately, breaking up the rows.)
library(data.table)
as.data.table(data)[, .SD[sample(.N, size = 1),], by = codigo_pon]
(dplyr has already been demonstrated twice)