EDIT: Revamped reflecting comments below
I've got some data on workers across time. They may work in more than one position in any given year; I want to subset the data to get workers who have at least one position with certain characteristics.
Here's my toy data:
set.seed(1643)
dt<-data.table(id=rep(1:1000,10),
area=sample(letters,1e4,replace=T),
position=sample(10,1e4,replace=T),
firm_type=sample(5,1e4,replace=T),
year=rep(2001:2010,each=1000),key="id")
I only want workers in area
d
,o
,w
,l
,e
at position
7
.
Unfortunately, the way that firm_type
was coded changed from 2005 onwards; prior to 2005, the relevant workers were all at firms with firm_type==1
. Thereafter, types 1
and 2
are acceptable.
I tried this look-up but it doesn't work:
dt[.(dt[firm_type %in% ifelse(year<2005,1,1:2)
&area %in% c("d","o","w","l","e")
&position==7,unique(id)])]
Specifically, the %in%
operator, as mentioned in the comments below, can't operate row-by-row, so we get (intermediate) output like:
> dt[firm_type %in% ifelse(year<2005,1,1:2)
+ &area %in% c("d","o","w","l","e")
+ &position==7,table(firm_type,year)]
year
firm_type 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
1 4 2 5 2 3 7 1 0 4 1
2 2 4 4 6 4 5 9 8 1 2
@Frank inspired this workaround:
dt[.(dt[ifelse(year<2005,firm_type==1,
firm_type %in% 1:2)
&area %in% c("d","o","w","l","e")
&position==7,unique(id)])]
I'm content with this, but I was encouraged to see if there's a better approach to this problem because ifelse
is not optimized.