0

I am trying to loop through a large address data set(300,000+ lines) based on a common factor for each observation, ID2. This data set contains addresses from two different sources, and I am trying to find matches between them. To determine this match, I want to loop through each ID2 as a factor and search for a line from each of the two data sets (building and property data sets) Here is a picture of my desire output Picture of desired output Here is a sample code of what I have tried

     PROPERTYNAME=c("Vista 1","Vista 1","Vista 1","Chesnut Street","Apple 
     Street","Apple Street")
     CITY=c("Pittsburgh","Pittsburgh","Pittsburgh","Boston","New York","New 
     York")
     STATE= c("PA","PA","PA","MA","NY","NY")
     ID2=c(1,1,1,2,3,3)
     IsBuild=c(1,0,0,0,1,1)
     IsProp=c(0,1,1,1,0,0)

    df=data.frame(PROPERTYNAME,CITY,STATE,ID2,IsBuild,IsProp)

    for(i in levels(as.factor(df$ID2))){
    for(row in 1:nrow(df)){
      df$Any_Build[row][i]<-ifelse(as.numeric(df$IsBuild[row][i])==1)
      df$Any_Prop[row][i]<-ifelse(as.numeric(df$IsProp[row][i])==1)
   }
}

I've tried nested for loops but have had no luck and am struggling with the apply functions of r. I would appreciate any help. Thank you!

Jackie
  • 11
  • 2
  • 1
    Data and code need to be available for us, none of us is looking for typing your picture code and also generate sample data to solve your “problem”. – n1tk Jul 19 '18 at 15:42
  • As stated above we can't give you working code to solve your problem unless you give us an example with code that we could copy and paste to run on our own machine. For some pointers on how to do this please see this [question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1) From the sounds of things you may want to look into a join based on ID2 – see24 Jul 19 '18 at 15:56
  • 1
    Thank you for the tips! I am new to the site and will try to be more courteous with my questions. Thank you again and I am sorry for the troubles. – Jackie Jul 19 '18 at 16:40

2 Answers2

0

How does ID2 affect the output? If it doesn't have any effect, you can use the same logic you used in your example code without the loop. Ifelse is vectorized so you dont have to run it per row Edited formatting:

LIHTCComp1$AnyBuild <- ifelse(LIHTCComp1$IsBuild ==1,TRUE,FALSE)
LIHTCComp1$AnyProp <- ifelse(LIHTCComp1$IsProp ==1,TRUE,FALSE)

Hope this helps.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
0

If your main dataset is called D and the building data set is called B and the property dataset is called P, you can do the following:

D$inB <- D$ID2 %in% B$ID2
D$inP <- D$ID2 %in% P$ID2

If you want some data in B, like let's say an address, you can use merge:

D <- merge(D, B[c("ID2", "address")], by = "ID2", all.x = TRUE, all.y = FALSE)

If every row in B has an address, then the NAs in the new address column in D should coincide with the FALSEs in D$inB.

Noah
  • 3,437
  • 1
  • 11
  • 27