I have a dataframe
like the one below:
x <- data.table(Tickers=c("A","A","A","B","B","B","B","D","D","D","D"),
Type=c("put","call","put","call","call","put","call","put","call","put","call"),
Strike=c(35,37.5,37.5,10,11,11,12,40,40,42,42),
Other=sample(20,11))
Tickers Type Strike Other
1: A put 35.0 6
2: A call 37.5 5
3: A put 37.5 13
4: B call 10.0 15
5: B call 11.0 12
6: B put 11.0 4
7: B call 12.0 20
8: D put 40.0 7
9: D call 40.0 11
10: D put 42.0 10
11: D call 42.0 1
I am trying to analyze a subset of the data. The subset I would like to take is data where the ticker
and strike
are the same. But I also only want to grab this data if both a put
and a call
exists under type
. With the data above for example, I would like to return the following result:
x[c(2,3,5,6,8:11),]
Tickers Type Strike Other
1: A call 37.5 5
2: A put 37.5 13
3: B call 11.0 12
4: B put 11.0 4
5: D put 40.0 7
6: D call 40.0 11
7: D put 42.0 10
8: D call 42.0 1
I'm not sure what the best way to go about doing this. My thought process is that I should create another column vector like
x$id <- paste(x$Tickers,x$Strike,sep="_")
Then use this vector to only pull values where there are multiple ids.
x[x$id %in% x$id[duplicated(x$id)],]
Tickers Type Strike Other id
1: A call 37.5 5 A_37.5
2: A put 37.5 13 A_37.5
3: B call 11.0 12 B_11
4: B put 11.0 4 B_11
5: D put 40.0 7 D_40
6: D call 40.0 11 D_40
7: D put 42.0 10 D_42
8: D call 42.0 1 D_42
I'm not sure how efficient this is, as my actual data consists of a lot more rows.
Also, this solution does not check for the type
condition of there being one put
and one call
.
also the wording of the title could be a lot better, I apologize
EDIT::: having checked out this post Finding ALL duplicate rows, including "elements with smaller subscripts"
I could also use this solution:
x$id <- paste(x$Tickers,x$Strike,sep="_")
x[duplicated(x$id) | duplicated(x$id,fromLast=T),]