2

I have a dataset that records the changes in a group from a certain ID, in a given month. In the example, in july, the ID 5 changed from group 2 to group 1, then from group 1 to 2, and so on. I need to get only the first and the last changes made in this ID/month.

ID  groupTO groupFROM   MONTH
5   2   1   6
5   1   2   7
5   2   1   7
5   3   2   7
5   1   3   7
5   2   1   8
5   1   2   8
5   2   1   8
6   1   2   6
6   3   1   6
6   2   1   7
6   3   2   8
6   1   3   8

In this case, i need the results to be:

ID  groupTO groupFROM   MONTH
5   2   1   6
5   1   2   7
5   1   3   7
5   2   1   8
5   2   1   8
6   1   2   6
6   3   1   6
6   2   1   7
6   3   2   8
6   1   3   8

If i remove the duplicates (ID/MONTH), i can get the first occurence, but how do i get the last one?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

4 Answers4

1

Here's an easy way you can do with dplyr;

library(dplyr)

# Create data
dt <- 
  data.frame(Id = c(rep(5, 8), rep(6, 5)), 
             groupTO = c(2, 1, 2, 3, 1, 2, 1, 2, 1, 3, 2, 3, 1),
             groupFROM = c(1, 2, 1, 2, 3, 1, 2, 1, 2, 1, 1, 2, 3),
             MONTH = c(6, 7, 7, 7, 7, 8, 8, 8, 6, 6, 7, 8, 8))

dt %>%
  # Group by ID and month
  group_by(Id, MONTH) %>%
  # Get first and last row
  slice(c(1, n())) %>%
  # To remove cases where first is same as last
  distinct()

# # A tibble: 9 x 4
# # Groups:   Id, MONTH [6]
# Id groupTO groupFROM MONTH
# <dbl>   <dbl>     <dbl> <dbl>
# 5       2         1     6
# 5       1         2     7
# 5       1         3     7
# 5       2         1     8
# 6       1         2     6
# 6       3         1     6
# 6       2         1     7
# 6       3         2     8
# 6       1         3     8
Nareman Darwish
  • 1,251
  • 7
  • 14
0

Using data.table

library(data.table)
unique(setDT(df1)[, .SD[c(1, .N)], .(ID, MONTH)])
#    ID MONTH groupTO groupFROM
#1:  5     6       2         1
#2:  5     7       1         2
#3:  5     7       1         3
#4:  5     8       2         1
#5:  6     6       1         2
#6:  6     6       3         1
#7:  6     7       2         1
#8:  6     8       3         2
#9:  6     8       1         3

data

df1 <- structure(list(ID = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 
6L, 6L, 6L), groupTO = c(2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 
3L, 2L, 3L, 1L), groupFROM = c(1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 
2L, 1L, 1L, 2L, 3L), MONTH = c(6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 
6L, 6L, 7L, 8L, 8L)), class = "data.frame", row.names = c(NA, 
-13L))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Here is a base R solution using split

dfout <- do.call(rbind,c(make.row.names = F,
                lapply(split(df,df[c("Id","MONTH")],lex.order = T), 
                       function(v) if (nrow(v)==1) v[1,] else v[c(1,nrow(v)),])))

such that

> dfout
   Id groupTO groupFROM MONTH
1   5       2         1     6
2   5       1         2     7
3   5       1         3     7
4   5       2         1     8
5   5       2         1     8
6   6       1         2     6
7   6       3         1     6
8   6       2         1     7
9   6       3         2     8
10  6       1         3     8```
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

A base R way using ave where we select 1st and last row for each ID and MONTH and select the unique rows in the dataframe.

unique(subset(df, ave(groupTO == 1, ID, MONTH, FUN = function(x) 
              seq_along(x) %in% c(1, length(x)))))

#   ID groupTO groupFROM MONTH
#1   5       2         1     6
#2   5       1         2     7
#5   5       1         3     7
#6   5       2         1     8
#9   6       1         2     6
#10  6       3         1     6
#11  6       2         1     7
#12  6       3         2     8
#13  6       1         3     8
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213