Quick way to conditionally assign values based on a column with a lot of values?

Question

Let's say my data looks like this:

df
ID  Location  
 1   54
 2   35 
 3   54
 4   35
 5   71

I'm interested in finding the frequency of visits to a given location, and then assigning that frequency (i.e. sum) to a new column based on the value in the Location column.

To begin, I've tried using the table function:

count<-as.data.frame(table(df))
count
var1  freq
54    2
35    2
71    1

From here, I'd like to create a new column in df, called count, which assigns freq=2 for each ID which corresponds to Location=54, for example. I.e., df would now look something like this:

df
ID  Location count 
 1   54      2
 2   35      2
 3   54      2
 4   35      2
 5   71      1

My real data contains too many Location values for me to feasibly write an ifelse statement to conditionally assign these count values. I'm not sure how to accomplish in an efficient manner (I could also create a null column and use the replace function in dplyr, but that would be similarly laborious. Any tips?

Thanks!

akrun · Answer 1 · 2017-04-24T21:02:28.140

3

We can use add_count from dplyr (in the devel version - soon to be released 0.6.0)

library(dplyr)
df %>% 
   add_count(Location)
# A tibble: 5 × 3
#     ID Location     n
#   <int>    <int> <int>
#1     1       54     2
#2     2       35     2
#3     3       54     2
#4     4       35     2
#5     5       71     1

But if we want to do this from the table output, we can use merge

merge(df, as.data.frame(table(df$Location)), by.x= "Location", by.y = "Var1")

edited Apr 24 '17 at 21:02

answered Apr 24 '17 at 20:47

akrun

874,273
37
540
662

oddly I get this error message: Error in function_list[[k]](value) : could not find function "add_count" – lecreprays Apr 24 '17 at 20:53
2

I believe that add_count is newly added to dplyr 0.6 and not in the main release yet. See https://github.com/tidyverse/dplyr/releases – Andrew Lavers Apr 24 '17 at 20:58
@epi99 Yes, you are right, updated the post – akrun Apr 24 '17 at 21:02

Andrew Lavers · Answer 2 · 2017-04-24T21:06:13.003

3

library(dplyr)
df %>% 
  group_by(Location) %>%
  mutate(n = n())

#      ID Location     n
#   <int>    <int> <int>
# 1     1       54     2
# 2     2       35     2
# 3     3       54     2
# 4     4       35     2
# 5     5       71     1

edited Apr 24 '17 at 21:06

answered Apr 24 '17 at 20:57

Andrew Lavers

4,328
1
12
19

score 2 · Answer 3 · answered Apr 24 '17 at 20:50

2

You could use ave to count the length of the data corresponding to each Location

ave(1:NROW(df), df$Location, FUN = length)
#[1] 2 2 2 2 1

answered Apr 24 '17 at 20:50

d.b

32,245
6
36
77

score 2 · Answer 4 · answered Apr 24 '17 at 20:52

It's also possible to do this in data.table as well:

library(data.table)
dt[,count := .N, by = Location]

dt
#   ID Location count
#1:  1       54     2
#2:  2       35     2
#3:  3       54     2
#4:  4       35     2
#5:  5       71     1

Data:

dt <- fread("ID  Location  
              1   54
              2   35 
              3   54
              4   35
              5   71")

Quick way to conditionally assign values based on a column with a lot of values?

4 Answers4