Assign a value to a column in R based on a percentage within each group

Question

[ Sample Output Below ]

1I need to create column C in a data frame where 30% of the rows within each group (column B) get a value 0.

How do I do this in R?

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please do not post pictures of data because then we cannot copy/paste the data into R for testing. — MrFlick, Sep 07 '21 at 17:18

score 2 · Answer 1 · answered Sep 07 '21 at 17:26

2

We may use rbinom after grouping by 'category' column. Specify the prob as a vector of values

library(dplyr)
df1 %>%
    group_by(category) %>%
    mutate(value = rbinom(n(), 1, c(0.7, 0.3))) %>%
    ungroup

-output

# A tibble: 9 x 3
    sno category value
  <int> <chr>    <int>
1     1 A            1
2     2 A            0
3     3 A            1
4     4 B            1
5     5 B            0
6     6 B            1
7     7 C            1
8     8 C            0
9     9 C            0

data

df1 <- structure(list(sno = 1:9, category = c("A", "A", "A", "B", "B", 
"B", "C", "C", "C")), class = "data.frame", row.names = c(NA, 
-9L))

answered Sep 07 '21 at 17:26

akrun

874,273
37
540
662

This works but the assignment is not precise with respect to the probabilities. I need it to be exact. For example, 70% of the rows within each group (A, B, C) should get 1. Is there a way to get to that? – NM24 Sep 07 '21 at 17:53
@NM24 For that you may need `sample` as in the other answer – akrun Sep 07 '21 at 17:56

score 1 · Accepted Answer · answered Sep 07 '21 at 17:23

If your data already exist (assuming this is a simplified answer), and if you want the value to be randomly assigned to each group:

library(dplyr)

d <- data.frame(sno = 1:9,
                category = rep(c("A", "B", "C"), each = 3))


d %>%
  group_by(category) %>%
  mutate(value = sample(c(rep(1, floor(n()*.7)), rep(0, n() - floor(n()*.7)))))

score 1 · Answer 3 · answered Sep 07 '21 at 17:43

Base R

set.seed(42)
d$value <- ave(
  rep(0, nrow(d)), d$category,
  FUN = function(z) sample(0:1, size = length(z), prob = c(0.3, 0.7), replace = TRUE)
)
d
#   sno category value
# 1   1        A     0
# 2   2        A     0
# 3   3        A     1
# 4   4        B     0
# 5   5        B     1
# 6   6        B     1
# 7   7        C     0
# 8   8        C     1
# 9   9        C     1

Data copied from Brigadeiro's answer:

d <- structure(list(sno = 1:9, category = c("A", "A", "A", "B", "B", "B", "C", "C", "C")), class = "data.frame", row.names = c(NA, -9L))

Assign a value to a column in R based on a percentage within each group

3 Answers3

data

Base R