I have a dataframe df with about 1000 rows with the following columns:
df$ID<- c("ab11", "ab12" ...) #about 1000 rows
df$ID1<-numbers ranging from 1 to 20k # for all intense and purposes this can be treated as class 'factor'
df$Acol<- #numbers ranging from 1 to 1000
df$Bcol<- #numbers ranging from 0 to 1
The following lists gives me 12 values in each list:
A<- seq(50,600,by=50)
B<- seq(0.2,1,by=0.75)
I am trying to do 2 things:
- I would like to create dataframes by filtering the original dataset with various combinations of lists A and B. So 144 dataframes.
- Once I have those dataframes I would like to combine 3 dataframes at a time and see if the frequency of the IDs match a master dataframe x and if they do, get the combinations information for the matching dataframe.
So for 1, this is my approach:
df_50_0.2<-subset(df, df$Acol>=50 & df$Bcol>=0.2)
I can't write that out 144 times- I need a loop. I tried nested loop but that doesn't give me every combination of A and B so I tried a while loop.
Here is my code:
i<-50
while (i<550) {
for (j in B) {
assign(paste("df","_",as.character(i),"_",as.character(j)), df %>%
filter (Acol>=i) %>%
filter(Bcol>=j),envir=.GlobalEnv
i<-i+50
}}
That give me the desired result except it doesn't split the dataframe according to B. So the output is similar to what I would have if I had just filtered the data with values of A.
For the second part I need to loop through all possible combinations of three data frames at a time. Here is my code:
df.final<-rbind (df_50_0.2,df_100_0.25,df_150_0.5)
tmp<-subset(table(df.final$ID),!(table(df.final$ID) %in% table(x$ID))
I would like the above to be in a loop. If tmp has any values, I don't want it to be an output. If it is 0, that is, it is a perfect match to the frequency of IDs in the master dataframe x, I want that to be written. So something like the following in a loop? I want all possible combinations checked iteratively to come up with the combinations that match the master dataframe x ID frequency perfectly:
if tmp = NULL
tmp
else rm(tmp)
Any help is much appreciated. A python solution is also welcome!
A solution available in the following link but modified for two columns could be helpful Filter loop to create multiple data frames