2

I have a data frame with an id column and I would like to count how often the same id appears (the column is sorted). I found a way, using 2 for loops but that is certainly highly inefficient. Can someone please come up with a better solution.

id <- c(15580, 16144, 16144, 16144, 16144, 16144, 17985, 17985, 17985, 17985)
df <- data.frame(id)
df <- cbind(df, tmp=1)

for(i in 2:nrow(df)) {
   if (df[i,1] == df[i-1,1]) {
      df[i,2] <- df[i-1,2] + 1
   }
}

df$cnt <- df$tmp

for(i in seq(nrow(df)-1,1,-1)){
   if (df[i,1] == df[i+1,1]) {
      df[i,3] <- df[i+1,3]
   }
}

Output of my code. Column cnt contains the count as I want to.

      id tmp cnt
1  15580   1   1
2  16144   1   5
3  16144   2   5
4  16144   3   5
5  16144   4   5
6  16144   5   5
7  17985   1   4
8  17985   2   4
9  17985   3   4
10 17985   4   4

In a second step it would be great to just get this output (unique ids only and the count):

    id cnt
 15580   1
 16144   5
 17985   4
Gecko
  • 354
  • 1
  • 10

1 Answers1

1

We can do this with count to go directly into the second step

library(dplyr)
count(df, id)
# A tibble: 3 x 2
#     id     n
#  <dbl> <int>
#1 15580     1
#2 16144     5
#3 17985     4

Or with table from base R

table(df$id)

If we have to go through step 1 first, use transmute after grouping by 'id'

stp1 <- df %>% 
         group_by(id) %>%
         transmute(cnt = n())

then from 'stp1', use distinct

distinct(stp1)
akrun
  • 874,273
  • 37
  • 540
  • 662