1

I have a large table in R and need to count the occurrences of the values in one of the columns, plus put the count in a column of it's own. Without aggregating.

so if I have:

var
23456
34567
45678
23456
etc

my output should be:

var      count
23456     2
34567     1
45678     1
23456     2
etc

I have really read all the threads but can't seem to find a way to do this without aggregating. What I need is the equivalent of excel's count(A1:A4;A1).

lmo
  • 37,904
  • 9
  • 56
  • 69
oceanfront
  • 169
  • 1
  • 4
  • 12

3 Answers3

7

Use ave. Say that your starting df is defined as follows:

df<-structure(list(var = c(23456L, 34567L, 45678L, 23456L)), .Names = "var", class = "data.frame", row.names = c(NA, 
-4L))
df
#    var
#1 23456
#2 34567
#3 45678
#4 23456

Then:

df$count<-ave(df$var,df$var,FUN=length)
#    var count
#1 23456     2
#2 34567     1
#3 45678     1
#4 23456     2
nicola
  • 24,005
  • 3
  • 35
  • 56
4

We can use data.table

library(data.table)
setDT(dat)[, n := .N, var]
dat
#     var n
#1: 23456 2
#2: 34567 1
#3: 45678 1
#4: 23456 2

Or use add_count from dplyr (only available in the dev version for now):

library(dplyr)
dat %>%
     add_count(var)
# A tibble: 4 x 2
#    var     n
#  <int> <int>
#1 23456     2
#2 34567     1
#3 45678     1
#4 23456     2
Axeman
  • 32,068
  • 8
  • 81
  • 94
akrun
  • 874,273
  • 37
  • 540
  • 662
3

The functions group_by and n() from dplyr will do the trick.

library(dplyr)

dat <- data.frame(var = c(23456, 34567, 45678, 23456))

dat %>%
  group_by(var) %>%
  mutate(n())

# Source: local data frame [4 x 2]
# Groups: var [3]
# 
#     var `n()`
#   <dbl> <int>
# 1 23456     2
# 2 34567     1
# 3 45678     1
# 4 23456     2
Peter
  • 7,460
  • 2
  • 47
  • 68