Looping through a data set based on a factor r

Question

I am trying to loop through a large address data set(300,000+ lines) based on a common factor for each observation, ID2. This data set contains addresses from two different sources, and I am trying to find matches between them. To determine this match, I want to loop through each ID2 as a factor and search for a line from each of the two data sets (building and property data sets) Here is a picture of my desire output Picture of desired output Here is a sample code of what I have tried

     PROPERTYNAME=c("Vista 1","Vista 1","Vista 1","Chesnut Street","Apple 
     Street","Apple Street")
     CITY=c("Pittsburgh","Pittsburgh","Pittsburgh","Boston","New York","New 
     York")
     STATE= c("PA","PA","PA","MA","NY","NY")
     ID2=c(1,1,1,2,3,3)
     IsBuild=c(1,0,0,0,1,1)
     IsProp=c(0,1,1,1,0,0)

    df=data.frame(PROPERTYNAME,CITY,STATE,ID2,IsBuild,IsProp)

    for(i in levels(as.factor(df$ID2))){
    for(row in 1:nrow(df)){
      df$Any_Build[row][i]<-ifelse(as.numeric(df$IsBuild[row][i])==1)
      df$Any_Prop[row][i]<-ifelse(as.numeric(df$IsProp[row][i])==1)
   }
}

I've tried nested for loops but have had no luck and am struggling with the apply functions of r. I would appreciate any help. Thank you!

Data and code need to be available for us, none of us is looking for typing your picture code and also generate sample data to solve your “problem”. — n1tk, Jul 19 '18 at 15:42
As stated above we can't give you working code to solve your problem unless you give us an example with code that we could copy and paste to run on our own machine. For some pointers on how to do this please see this [question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1) From the sounds of things you may want to look into a join based on ID2 — see24, Jul 19 '18 at 15:56
Thank you for the tips! I am new to the site and will try to be more courteous with my questions. Thank you again and I am sorry for the troubles. — Jackie, Jul 19 '18 at 16:40

score 0 · Answer 1 · edited Jul 19 '18 at 17:20

0

How does ID2 affect the output? If it doesn't have any effect, you can use the same logic you used in your example code without the loop. Ifelse is vectorized so you dont have to run it per row Edited formatting:

LIHTCComp1$AnyBuild <- ifelse(LIHTCComp1$IsBuild ==1,TRUE,FALSE)
LIHTCComp1$AnyProp <- ifelse(LIHTCComp1$IsProp ==1,TRUE,FALSE)

Hope this helps.

edited Jul 19 '18 at 17:20

Bhargav Rao

50,140
28
121
140

answered Jul 19 '18 at 15:47

Sai Tarun Yadalam

11
3

Thank you! I want to be able to reset though for each ID2 for exple – Jackie Jul 19 '18 at 16:36

score 0 · Answer 2 · answered Jul 19 '18 at 16:03

If your main dataset is called D and the building data set is called B and the property dataset is called P, you can do the following:

D$inB <- D$ID2 %in% B$ID2
D$inP <- D$ID2 %in% P$ID2

If you want some data in B, like let's say an address, you can use merge:

D <- merge(D, B[c("ID2", "address")], by = "ID2", all.x = TRUE, all.y = FALSE)

If every row in B has an address, then the NAs in the new address column in D should coincide with the FALSEs in D$inB.

Looping through a data set based on a factor r

2 Answers2