subset a data frame based on other data frame which can contain NAs?

Question

I have a data frame df which look like the following .

Now with my r shiny App i have to upload an excel file und save that data which has same column names .but some colums are totally empty ,som other contains one values and some others have more than one values .As said after saving this data .i want to filder the first data based on the values appearing on teh second (of course there will be no filtering by a certain column if its corresponding columns is empty).

Do not get surprised if you see that not all columns in the first image of the data frame are present in the second image of the data frame because the first one is big and i did not snipped every thing but be sure all columns by the second image are available by the first and not vice versa since the second one is designed only for filtration.

my code was like teh following. Although it was effecient locally but when deployed on the server does not work anymore:

     files<-list.files(paste0(getwd(),"/","Melexist"),pattern=".xlsx")
    df2<-list()
dat<- mergeddata
      df1<-dat[dat$M_Datum >="2021-01-01" & dat$M_Datum <= "2021-01-03",]
    for (i in 1:length(files)){
      
      
      df2[[i]]<-readxl::read_excel(paste0(getwd(),"/","Melexist","/",files[i]))
   
    
      for(f in names(df2[[i]])){
        if(!is.na(df2[[i]][[f]]))
          df1 <- df1[df1[[f]] %in% df2[[i]][[f]],]
        
      }}

@moodymudskipper tje below existing df1 and df2 from the solution of No future are perfect example.But the proposed solution is is false.I want to filter every column in df1 based on the values available in the corresponding column in df2.i was using the following but it did not work one my r shiny deployed in server — samir, Feb 23 '23 at 18:36
Thanks, the reprex should be part of the question instead of screen capture. This might help: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — moodymudskipper, Feb 23 '23 at 18:40

score 0 · Answer 1 · answered Feb 23 '23 at 17:03

If I am understanding correctly you want a fuzzy match between the datasets, and this might not be the most elegant solution but should work:

library(fuzzyjoin)
library(stringr)
# construct some data frame
df1 <- data.frame("col1" = c("carrots", "beets", "tomatoes"), "col2" =c("frogs", 
                   "fishies", "sharks") , "col3" = c(100, 10, 84)) 
# generate some ID based on concatenating rest of columns
df1$ID <- do.call(paste0, df1)

# same for second data frame
df2 <- data.frame("col1" = "beets", "col2" = "", "col3" = "10")
df2$ID <- do.call(paste0, df2)

# fuzzy match based on ID
df2 %>% fuzzy_inner_join(df1, by = "ID", match_fun = str_detect)

I can't test it right now but you might need to flip df2 and df1 in the last line. Essentially we are generating some ID to do a fuzzy match to link up those 2 data sets.

unfortunnaly your solution did not work – samir Feb 23 '23 at 19:34 — samir, Feb 23 '23 at 19:34

subset a data frame based on other data frame which can contain NAs?

1 Answers1