-1

I have a dataframe with column a:

x = data.frame(
    "a" = c(F, F, F, T,
            F, T, T, F,
            T, T, F)
)

I would like to know for every e.g. 4 rows what the frequency of a being T is and apply this value to a new column b, so that for the first 4 rows the frequency of T is 1/4, for the next 4 rows the frequency of T is 2/4 and for the remaining 3 rows the frequency of T is 2/3:

x$b = c(0.25,0,25,0.25,0.25
        0.5,0.5,0.5,0.5,
        0.66,0.66,0.66)

I can get the frequency of column a by using tapply, but this gives me a list not vector as a result.

I would appreciate answers without use of external libraries.

sigvardsen
  • 1,531
  • 3
  • 26
  • 44
  • 1
    If you already know how to achieve your expected output with tapply, why not just transforming the list to a vector? – Alex Mar 26 '17 at 15:14

2 Answers2

3

One option is ave from base R. Create a grouping variable with gl and the default function of ave is mean it takes the mean of logical column 'a' to get the output

x$b <- with(x, ave(a, as.integer(gl(nrow(x), 4, nrow(x)))))
x$b
#[1] 0.2500000 0.2500000 0.2500000 0.2500000 0.5000000 0.5000000 
#[7] 0.5000000 0.5000000 0.6666667 0.6666667 0.6666667

Or using the same methodology in data.table

library(data.table)
setDT(x)[, b := mean(a), .(grp= as.integer(gl(nrow(x), 4, nrow(x))))]
x
#    a         b
# 1: FALSE 0.2500000
# 2: FALSE 0.2500000
# 3: FALSE 0.2500000
# 4:  TRUE 0.2500000
# 5: FALSE 0.5000000
# 6:  TRUE 0.5000000
# 7:  TRUE 0.5000000
# 8: FALSE 0.5000000
# 9:  TRUE 0.6666667
#10:  TRUE 0.6666667
#11: FALSE 0.6666667

Or with dplyr

library(dplyr)
x %>%
  group_by(grp = as.integer(gl(nrow(x), 4, nrow(x)))) %>%
  mutate(b = mean(a)) %>%
  ungroup() %>%
  select(-grp)
akrun
  • 874,273
  • 37
  • 540
  • 662
2

We can use base R ave. We can create a group of every n elements and find the ratio of number of TRUE elements by total elements for every group.

n <- 4
x$b <- ave(x$a, rep(seq(1, nrow(x)), each = n, length.out = nrow(x)), 
                                      FUN = function(x) sum(x)/length(x))
x
#     a         b
#1  FALSE 0.2500000
#2  FALSE 0.2500000
#3  FALSE 0.2500000
#4   TRUE 0.2500000
#5  FALSE 0.5000000
#6   TRUE 0.5000000
#7   TRUE 0.5000000
#8  FALSE 0.5000000
#9   TRUE 0.6666667
#10  TRUE 0.6666667
#11 FALSE 0.6666667
akrun
  • 874,273
  • 37
  • 540
  • 662
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213