0

I have a dataframe similar to the one below and I need to count how many times the same row pattern repeats in this data frame.

start_id | end_id | type | id
1        | 2      | a    | 1
2        | 5      | a    | 2
1        | 3      | b    | 3
2        | 5      | a    | 4
1        | 3      | b    | 5

The result I want is this:

start_id | end_id | type | n
1        | 2      | a    | 1
2        | 5      | a    | 2
1        | 3      | b    | 2

I tried the following code, but it is not merging the records, it is returning the same rows as they are, just adding a new column with the counter, which is bad for my analysis:

Sumary <- clear_filt_trip  %>%
    group_by(start_id, end_id, type) %>% 
    add_count(across(everything()))

I tried using summarize but it's just repeating the columns.

What can I do about it?

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    Look at the dupe-link, and in your head replace all mentions of "mean by group" with something like "length" or "number of rows" or similar, leading to the same set of possible solutions (within `dplyr` or not). – r2evans Dec 30 '21 at 17:22

2 Answers2

4

dplyr

library(dplyr)
dat %>%
  group_by(start_id, end_id, type) %>%
  tally() %>%
  ungroup()
# # A tibble: 3 x 4
#   start_id end_id type      n
#      <dbl>  <dbl> <chr> <int>
# 1        1      2 a         1
# 2        1      3 b         2
# 3        2      5 a         2

base R

aggregate(. ~ start_id + end_id + type, data = dat, FUN = length)
#   start_id end_id type id
# 1        1      2    a  1
# 2        2      5    a  2
# 3        1      3    b  2

Data

dat <- structure(list(start_id = c(1, 2, 1, 2, 1), end_id = c(2, 5, 3, 5, 3), type = c("a", "a", "b", "a", "b"), id = 1:5), row.names = c(NA, -5L), class = "data.frame")
r2evans
  • 141,215
  • 6
  • 77
  • 149
2

And again, additionaly to r2evans:

data.table

library(data.table)
   
df[, id:=NULL]

df[, .N, by=names(df)]

   start_id end_id type N
1:        1      2    a 1
2:        2      5    a 2
3:        1      3    b 2

data:

df = structure(list(start_id = c(1L, 2L, 1L, 2L, 1L), end_id = c(2L, 
5L, 3L, 5L, 3L), type = c("a", "a", "b", "a", "b"), id = 1:5), row.names = c(NA, 
-5L), class = c("data.table", "data.frame"))
Marco_CH
  • 3,243
  • 8
  • 25