Subset rows that occur two times based on 2 columns

Question

I am trying to subset rows that occur twice in my df (YearlyDataTMEANPre) dependent on the DATE and UNIQUEID columns. Here is what my df looks like.

            ID     DATE TYPE VALUE COL2 NA. NA.1 NA.2 UNIQUEID
6  ASN00015643 20170101 TMAX 81.32             a <NA>      330
7  ASN00015643 20170101 TMIN 71.24             a <NA>      330
9  ASN00085296 20170101 TMAX 71.06             a <NA>      733
10 ASN00085296 20170101 TMIN 54.86             a <NA>      733
13 ASN00085280 20170101 TMIN 60.08             a <NA>      730
15 ASN00040209 20170101 TMAX 84.74             a <NA>      492
16 ASN00040209 20170101 TMIN 77.00             a <NA>      492
40 CA005030984 20170101 TMAX 12.38             C <NA>     1623
41 CA005030984 20170101 TMIN -2.56             C <NA>     1623

Note that there is a TMAX and TMIN value on each DATE for each station (ID or UNIQUEID). I am trying to make sure I only get rows where I have both TMAX and TMIN on each date. For example, there may be a day where a station only recorded the TMAX and not the TMIN. I created a numerical UNIQUEID to help with this.

The line of code I tried is,

YearlyDataTMEAN <- subset(YearlyDataTMEANPre, UNIQUEID & DATE == 2)

however, this seems to get me zero rows. I must be missing something obvious but I am new to R. I would like the output to look like,

                ID     DATE TYPE VALUE COL2 NA. NA.1 NA.2 UNIQUEID
6  ASN00015643 20170101 TMAX 81.32             a <NA>      330
7  ASN00015643 20170101 TMIN 71.24             a <NA>      330
9  ASN00085296 20170101 TMAX 71.06             a <NA>      733
10 ASN00085296 20170101 TMIN 54.86             a <NA>      733
15 ASN00040209 20170101 TMAX 84.74             a <NA>      492
16 ASN00040209 20170101 TMIN 77.00             a <NA>      492
40 CA005030984 20170101 TMAX 12.38             C <NA>     1623
41 CA005030984 20170101 TMIN -2.56             C <NA>     1623

Note that row 13 in the first table is gone in the output.

Thanks!

score 1 · Answer 1 · answered Jun 26 '17 at 20:44

1

You can use duplicated function and get what you want:

YearlyDataTMEANPre[duplicated(YearlyDataTMEANPre[,c('UNIQUEID', 'DATE')]),]

answered Jun 26 '17 at 20:44

M--

25,431
8
61
93

Hello. I may have been unclear in my the original post and I have since amended it with a ideal output table. The duplicate function gives me only one value per day when I would like two values per day (TMAX and TMIN). – Zacharyj7 Jun 27 '17 at 12:30

score 0 · Answer 2 · answered Jun 27 '17 at 12:52

0

I actually figured it out using subset and table. Miracles do happen.

subset(YearlyDataTMEANPre, table(YearlyDataTMEANPre$UNIQUEID, YearlyDataTMEANPre$DATE) == 2)

answered Jun 27 '17 at 12:52

Zacharyj7

27
5

https://stackoverflow.com/questions/16905425/find-duplicate-values-in-r – M-- Jun 27 '17 at 14:00

Subset rows that occur two times based on 2 columns

2 Answers2