How to delete rows with a unique ID from a panel data frame in R?

Question

I have a data table containing thousands of firms that can be identified by a unique ID. It is long format data and each firm is supposed to appear twice in different years (cross-sectional time-series over two years).

However, not all firms appear in both years and I am trying to create a balanced long format panel in which only firms remain that appear in both years. How do I accomplish this?

This is an example data table to illustrate the issue:

example <- matrix(c(1,1,2,3,3,2013,2016,2013,2013,2016), ncol=2)
colnames(example) <- c('id', 'year')
example.table <- data.table(example)
example.table

   id year
1:  1 2013
2:  1 2016
3:  2 2013
4:  3 2013
5:  3 2016

In the example, I need a code/function that lets me exclude the row of the firm with the id "2", because it has no match in 2016. In other words: I need a code/function that compares each row with the previous & subsequent row and excludes it, if there is no match in the id column.

I have invested many hours, but appear to have reached the limits of my R knowledge and would appreciate any support. Thanks!

score 3 · Accepted Answer · answered Apr 11 '19 at 17:02

3

Using dplyr as below:

library(dplyr)
example.table %>%
  group_by(id) %>%
  filter(n() > 1)
# A tibble: 4 x 2
# Groups:   id [2]
     id  year
  <dbl> <dbl>
1     1  2013
2     1  2016
3     3  2013
4     3  2016

answered Apr 11 '19 at 17:02

Sonny

3,083
1
11
19

1

That's awesome, I didn't realize you could use `group_by` for `filter` in addition to `mutate` and `summarise` – Ben G Apr 11 '19 at 17:13

akrun · Answer 2 · 2019-04-11T17:07:20.750

We create a vector of unique 'year' from the whole dataset, then check if all the values in 'nm1' are %in% the 'year' grouped by 'id' and subset the data.table

un1 <- unique(example.table$year)
example.table[, .SD[all(un1 %in% year)], id]
#   id year
#1:  1 2013
#2:  1 2016
#3:  3 2013
#4:  3 2016

NOTE: The OP's dataset is data.table and the method used is data.table here. Initially, thought about using .SD[uniqueN(year) > 1], but that is wrong and may not work for all cases

score 0 · Answer 3 · answered Apr 11 '19 at 17:24

0

data.table equivalent solution to @Sonny's dplyr solution

example.table[, if(.N > 1) .SD, id]

   id year
1:  1 2013
2:  1 2016
3:  3 2013
4:  3 2016

answered Apr 11 '19 at 17:24

cropgen

1,920
15
24

How to delete rows with a unique ID from a panel data frame in R?

3 Answers3