0

Let's say my data looks like this:

df
ID  Location  
 1   54
 2   35 
 3   54
 4   35
 5   71

I'm interested in finding the frequency of visits to a given location, and then assigning that frequency (i.e. sum) to a new column based on the value in the Location column.

To begin, I've tried using the table function:

count<-as.data.frame(table(df))
count
var1  freq
54    2
35    2
71    1

From here, I'd like to create a new column in df, called count, which assigns freq=2 for each ID which corresponds to Location=54, for example. I.e., df would now look something like this:

df
ID  Location count 
 1   54      2
 2   35      2
 3   54      2
 4   35      2
 5   71      1

My real data contains too many Location values for me to feasibly write an ifelse statement to conditionally assign these count values. I'm not sure how to accomplish in an efficient manner (I could also create a null column and use the replace function in dplyr, but that would be similarly laborious. Any tips?

Thanks!

lecreprays
  • 77
  • 1
  • 13

4 Answers4

3

We can use add_count from dplyr (in the devel version - soon to be released 0.6.0)

library(dplyr)
df %>% 
   add_count(Location)
# A tibble: 5 × 3
#     ID Location     n
#   <int>    <int> <int>
#1     1       54     2
#2     2       35     2
#3     3       54     2
#4     4       35     2
#5     5       71     1

But if we want to do this from the table output, we can use merge

merge(df, as.data.frame(table(df$Location)), by.x= "Location", by.y = "Var1")
akrun
  • 874,273
  • 37
  • 540
  • 662
3
library(dplyr)
df %>% 
  group_by(Location) %>%
  mutate(n = n())

#      ID Location     n
#   <int>    <int> <int>
# 1     1       54     2
# 2     2       35     2
# 3     3       54     2
# 4     4       35     2
# 5     5       71     1
Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19
2

You could use ave to count the length of the data corresponding to each Location

ave(1:NROW(df), df$Location, FUN = length)
#[1] 2 2 2 2 1
d.b
  • 32,245
  • 6
  • 36
  • 77
2

It's also possible to do this in data.table as well:

library(data.table)
dt[,count := .N, by = Location]

dt
#   ID Location count
#1:  1       54     2
#2:  2       35     2
#3:  3       54     2
#4:  4       35     2
#5:  5       71     1

Data:

dt <- fread("ID  Location  
              1   54
              2   35 
              3   54
              4   35
              5   71")
Mike H.
  • 13,960
  • 2
  • 29
  • 39