-1

Good morning guys,I'm searching a better code to replace my code because of my old computer......

I use my code to assign ID based on the date range,the code is ok for small sample.

However,my data has 7000000obs and my date range has 70000obs,my old computer cannot finish this job because of my poor code,how can I improve the code efficiency?or may be have some better package I can use?Thanks you guys.

enter image description here enter image description here


test=portnodaterange ###date range data
testdata=data        ###data                   
emptyc=rep(NA,nrow(testdata))

for(i in 1:nrow(test)){
  for (j in 1:nrow(testdata)){
  if(testdata$crsp_fundno[j]==test$crsp_fundno[i] & between(testdata$caldt[j],test$begdt[i],test$enddt[i])==TRUE){emptyc[j]=test$crsp_portno[i]}
###assign suitable numbers to emptyc**
  }}
  • 2
    Please don't post code/data as images, give it to us as text instead. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for some tips. – AkselA Sep 15 '19 at 20:39
  • Reasons we often discourage an image of code/data/errors: it cannot be copied or searched (SEO), it breaks screen-readers, and it may not fit well on some mobile devices. Ref: https://meta.stackoverflow.com/a/285557/3358272 (and https://xkcd.com/2116/). Please just include the code or data (e.g., `dput(head(x))` or `data.frame(...)`) directly. – r2evans Sep 15 '19 at 20:52
  • 1
    This sounds like a "non-equi join" which can be performed using the `data.table` or `fuzzyjoin` packages. Lots of examples on this site. – Jon Spring Sep 15 '19 at 20:54

1 Answers1

0

Assuming a one-to-many relationship between testdata and test by crsp_fundno, consider merge and subset and avoid nested loops:

mrg_df <- merge(testdata, test, by="crsp_fundno")
sub_df <- subset(mrg_df, caldt >= begdt | caldt <= enddt)

emptyc <- sub_df$crsp_portno
Parfait
  • 104,375
  • 17
  • 94
  • 125