Keeping rows for identical columns in a data.table in R

Question

I have a table looking like this:

DT <-data.table(ID=c(1:4),AREA=c("a","b","c","d"),PARTNER=c("f","b","g","d"),OBS_VALUE=c(10,5,13,0))

As a result I would like to get the records for which AREA and PARTNER are equal AND the OBS_VALUE is not equal to 0 or NA.

The identical function test globally if the two columns are identical.

setkeyv(DT,c("AREA","PARTNER"))
identical(DT['AREA'],DT['PARTNER'])

Result is obviously FALSE.

I do not know how to arrive to the target. Thanks for helping.

The answer received with

DT[REF_AREA==COUNTERPART_AREA & !is.na(OBS_VALUE) & OBS_VALUE!=0]

gives a error message:

Error in Ops.factor(REF_AREA, COUNTERPART_AREA) : 
  level sets of factors are different

Indeed my data.Table is more complex:

dput(head(diagonal))
structure(list(TIME_PERIOD = c(2010L, 2010L, 2010L, 2010L, 2010L, 
2010L), REF_AREA = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("AT", 
"BE", "BG", "CY", "CZ", "DE", "DK", "EE", "ES", "FI", "FR", "GB", 
"GR", "HR", "HU", "IE", "IT", "LT", "LU", "LV", "MT", "NL", "PL", 
"PT", "RO", "SE", "SI", "SK"), class = "factor"), COUNTERPART_AREA =     structure(c(20L, 20L, 20L, 20L, 20L, 20L), .Label = c("4A", "4F", "4S", "9A", 
"A1", "A2", "A5", ... "AT", ..., "W1"), class = "factor"), 
, .Names = c("TIME_PERIOD", "REF_AREA", "COUNTERPART_AREA", "UNIT_MEASURE", "INT_ACC_ITEM", "ACCOUNTING_ENTRY", "OBS_VALUE", "OBS_COMMENT", "DECIMALS", "UNIT_MULT", "i.CONF_STATUS", "i.OBS_STATUS"), sorted = c("REF_AREA", "COUNTERPART_AREA"), class = c("data.table", "data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x07d424a0>)

Any idea?

Please read [How to make a great reproducible example in R?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). They're all unique. — M--, Jun 21 '17 at 13:31

score 0 · Answer 1 · answered Jun 21 '17 at 13:33

0

DT[AREA==PARTNER & !is.na(OBS_VALUE) & OBS_VALUE!=0]

answered Jun 21 '17 at 13:33

simone

577
1
7
15

Thanks. I updated the question. your line gives an error because of factors issue – IRT Jun 21 '17 at 13:50
so the data.table is not the one you posted? can you provide a data.table that reproduces the error you are getting? – simone Jun 21 '17 at 14:01
sorry I am not skilled enough yet. I solved it transforming the factors in characters. Then it works. – IRT Jun 21 '17 at 15:08

roarkz · Answer 2 · 2017-06-21T13:41:32.790

0

Using dplyr:

library(dplyr)

DT <-data.frame(ID=c(1:4),AREA=c("a","b","c","d"),PARTNER=c("f","b","g","d"),OBS_VALUE=c(10,5,13,0), stringsAsFactors = FALSE)

DT %>% 
  filter(AREA == PARTNER & OBS_VALUE != 0 & !is.na(OBS_VALUE))

edited Jun 21 '17 at 13:41

answered Jun 21 '17 at 13:36

roarkz

811
10
22

Keeping rows for identical columns in a data.table in R

2 Answers2