1

Suppose this dataset:

    date      exdate         flag       V1            V2
 1996-01-04  1996-01-20       P       400000      -0.001181
 1996-01-04  1996-01-20       C       400000      -0.004897
 .............
 1996-01-04  1996-01-20       P       530000      -0.005147
 .............
 1996-01-04  1996-01-20       P       535000      -0.005423
 .............
 1996-01-04  1996-01-20       C       545000      -0.007922
 1996-01-04  1996-01-20       P       545000      -0.008389

I have ordered the data in the desired format, and I want to only extract the rows that are in the same group of date, exdate, and V1, but remove the groups with a single observation. Note that there are groups that contain only one observation, and not both C and P (in the flag variable) as in the first and last group.

It is possible through a package in R like dplyr::filter?

I have tried this:

data %>% group_by(date,exdate,V1) %>% filter(V1[flag=="P"]==V1[flag=="C"])

with a returning error:

Error in filter_impl(.data, dots) : incorrect length (0), expecting: 1

The goal is to get a dataset that is like this:

    date      exdate         flag       V1            V2
 1996-01-04  1996-01-20       P       400000      -0.001181
 1996-01-04  1996-01-20       C       400000      -0.004897
 .............
 1996-01-04  1996-01-20       C       545000      -0.007922
 1996-01-04  1996-01-20       P       545000      -0.008389
Hercules Apergis
  • 423
  • 6
  • 20
  • Do you have same number of elements that are 'P' and 'C' for each group – akrun May 15 '17 at 17:16
  • 1
    V1[FALSE] will have a length of zero, explaining that error... – Frank May 15 '17 at 17:17
  • Did you mean to write: `filter( length(V1[flag=="P"]) == length(V1[flag=="C"]) )` ? – IRTFM May 15 '17 at 17:19
  • Possibly relevant: [Count number of observations/rows per group and add result to data frame](http://stackoverflow.com/questions/7450600/count-number-of-observations-rows-per-group-and-add-result-to-data-frame) and [Subset data frame based on number of rows per group](http://stackoverflow.com/questions/20204257/subset-data-frame-based-on-number-of-rows-per-group) – Henrik May 15 '17 at 17:24
  • @akrun The p and c elements are not of equal number. @42 I hadn't thought of using the `length` in the `filter`. Will give it a shot! – Hercules Apergis May 15 '17 at 18:10
  • @Frank It could explain it...yeah, so basically whenever that happens, that group will have either C or P, but not both! Thus that observation must be removed. – Hercules Apergis May 15 '17 at 18:18

1 Answers1

1
data <- read.table(header = T, text = '
date      exdate         flag       V1            V2
1996-01-04  1996-01-20       P       400000      -0.001181
1996-01-04  1996-01-20       C       400000      -0.004897
1996-01-04  1996-01-20       P       530000      -0.005147
1996-01-04  1996-01-20       P       535000      -0.005423
1996-01-04  1996-01-20       C       545000      -0.007922
1996-01-04  1996-01-20       P       545000      -0.008389
')

library(dplyr)
data %>% 
  group_by(date, exdate, V1) %>% 
  filter(n() == 2)
CJ Yetman
  • 8,373
  • 2
  • 24
  • 56