I am facing some difficulties with filtering a dataframe in a loop.
I am copying data from an Excel file, putting everything in a list of data frames. I am trying to do some cleanup and the 8th column (.periodenddate) contains some dates. But there is some string starting by "Unable...". So I was trying to take out those rows to be able to convert the numbers into dates. Being from Excel, they look like "43890" at the moment.
I did find a way to make it work at the end of the code when I taking the data frame out of the list but ideally I would keep it inside the list to loop through it.
Also, I can not share the file unfortunately but it is simply Excel file with 15 sheets with the following columns: ISIN, MAIN, .Date, .Calcdate, .brokername, .fperiod, .rfperiod, .periodenddate, .analystcode, .analystname
library(tidyverse)
library(readxl)
myPath <- "MyFile.xlsm"
sheetsName <- excel_sheets(myPath)
for(a in 1:length(sheetsName)){
if (!(sheetsName[[a]] %in% c("Isin&Date", "Parameters", "TempSheet","Unique ISIN"))){
assign(paste0("DT_", sheetsName[[a]]), read_excel(myPath,a))
}
}
FY <- list() #Was not sure how to make this part cleaner
FY[[1]] <- rbind(`DT_Output FY-1(1)`,`DT_Output FY-1(2)`)
FY[[2]] <- rbind(`DT_Output FY-2(1)`,`DT_Output FY-2(prev)`)
FY[[3]] <- rbind(`DT_Output FY-3(1)`,`DT_Output FY-3(2)`)
FY[[4]] <- rbind(`DT_Output FY-4(1)`,`DT_Output FY-4(2)`,`DT_Output FY-4(prev)`)
FY[[5]] <- rbind(`DT_Output FY-5(1)`,`DT_Output FY-5(2)`)
FY[[6]] <- rbind(`DT_Output FY-6(1)`,`DT_Output FY-6(prev)`)
FY[[7]] <- rbind(`DT_Output FY-7(1)`,`DT_Output FY-7(2)`)
FY[[8]] <- rbind(`DT_Output FY-8(1)`,`DT_Output FY-8(prev)`)
rm(list=setdiff(ls(),"FY"))
for (a in 1:length(FY)){
FY[[a]] <- subset(FY[[a]],FY[[a]]$MAIN!="NULL")
FY[[a]] <- distinct(FY[[a]])
FY[[a]] <- FY[[a]] %>% #part that does not work
filter(!grepl('able', FY[[a]][8]))
}
#Thing that works:
FYTest <- FY[[1]]
FYTest <- FYTest %>%
filter(!grepl('able', FYTest$.periodenddate))
Also class(FY[[1]][8]) gives the following results: [1] "tbl_df" "tbl" "data.frame"
EDIT:
Code that worked:
FY <- excel_sheets(myPath) %>%
setdiff(c("Isin&Date", "Parameters", "TempSheet","Unique ISIN")) %>%
map(read_excel, path = myPath) %>%
bind_rows()
colnames(FY)<- c("ISIN","estEPS","date","calcDate","broker","fPeriod","rfPeriod","periodEnd","analystCode","anlystName")
FY <- FY %>%
subset(.,estEPS!="NULL" & periodEnd!="NULL" & !str_detect(periodEnd, "field")) %>%
distinct() %>%
mutate(., periodEnd = as.Date(as.numeric(periodEnd), origin = "1899-12-30"))