2

I have, for example, a vector with 1000 obs and 3 levels (A, B, C). I want to count how many times level A occurs for every 5 rows and produce another vector of the count values, ie with 200obs. Is anyone able to help? I've found how to count based on another variable but not number of rows. Thank you!

df <- data.frame(test=factor(sample(c("A","B", "C" ),1000,replace=TRUE)))
head(df, 10)
   test
1     A
2     A
3     B
4     C
5     B
6     A
7     C
8     B
9     C
10    C
Noosentin
  • 71
  • 9
  • 3
    Perhaps `lapply(split(df$test, rep(1:200, each = 5)), table)`? – talat Apr 27 '16 at 12:44
  • Possible duplicate of [R - how to count how many values per level in a given factor?](http://stackoverflow.com/questions/26114525/r-how-to-count-how-many-values-per-level-in-a-given-factor) –  Apr 27 '16 at 13:49

4 Answers4

4

Here are a couple of options you might find useful:

a) count all entries per 5 rows and return a list:

head(lapply(split(df$test, rep(1:200, each = 5)), table), 2)
# $`1`      # <- result for rows 1:5
# 
# A B C 
# 1 0 4 
# 
# $`2`      # <- result for rows 6:10
# 
# A B C 
# 3 0 2 

b) count all entries per 5 rows and return a matrix:

head(t(sapply(split(df$test, rep(1:200, each = 5)), table)), 2)
#   A B C
# 1 1 0 4
# 2 3 0 2

c) count number of As per 5 rows and return a list:

head(lapply(split(df$test == "A", rep(1:200, each = 5)), sum), 2)
# $`1`
# [1] 1
# 
# $`2`
# [1] 3

d) count number of As per 5 rows and return a vector:

head(sapply(split(df$test == "A", rep(1:200, each = 5)), sum), 2)
#1 2 
#1 3 

Each of the results will be 200 entries long / have 200 rows.

talat
  • 68,970
  • 21
  • 126
  • 157
  • Instead of `rep(1:200, each = 5)` you could also use something like `((seq_len(nrow(df)) -1) %/% 5) +1` – talat Apr 27 '16 at 13:15
  • 1
    An alternative to `split`ting could be `table(rep(seq_len(nrow(df) / 5), each = 5), df$test)` – alexis_laz Apr 27 '16 at 13:44
2

Here is a solution with dplyr and tidyr

library(dplyr)
library(tidyr)
df %>%
  mutate(Set = (seq_along(test) - 1) %/% 5) %>%
  group_by(Set, test) %>%
  summarise(N = n()) %>%
  spread(key = test, value = N, fill = 0)
Thierry
  • 18,049
  • 5
  • 48
  • 66
1

We can use data.table

library(data.table)
setDT(df)[, .N , .(grp= gl(nrow(df), 5, nrow(df)), test)]
akrun
  • 874,273
  • 37
  • 540
  • 662
0

If you prefer dplyr, you could use

  c1 <- df %>%
  mutate(group = rep(paste0("G", seq(1, 200)), each = 5)) %>%
  # count each level
  count(group, test)

Note that this method doesn't include levels with no values for a certain group (i.e. no 0 values)

JohnSG
  • 1,567
  • 14
  • 26