0

I am facing some difficulties with filtering a dataframe in a loop.

I am copying data from an Excel file, putting everything in a list of data frames. I am trying to do some cleanup and the 8th column (.periodenddate) contains some dates. But there is some string starting by "Unable...". So I was trying to take out those rows to be able to convert the numbers into dates. Being from Excel, they look like "43890" at the moment.

I did find a way to make it work at the end of the code when I taking the data frame out of the list but ideally I would keep it inside the list to loop through it.

Also, I can not share the file unfortunately but it is simply Excel file with 15 sheets with the following columns: ISIN, MAIN, .Date, .Calcdate, .brokername, .fperiod, .rfperiod, .periodenddate, .analystcode, .analystname

library(tidyverse)
library(readxl)

myPath <- "MyFile.xlsm"
sheetsName <- excel_sheets(myPath)

for(a in 1:length(sheetsName)){
  if (!(sheetsName[[a]] %in% c("Isin&Date", "Parameters", "TempSheet","Unique ISIN"))){
    assign(paste0("DT_", sheetsName[[a]]), read_excel(myPath,a))
  }
}

FY <- list() #Was not sure how to make this part cleaner
FY[[1]] <- rbind(`DT_Output FY-1(1)`,`DT_Output FY-1(2)`)
FY[[2]] <- rbind(`DT_Output FY-2(1)`,`DT_Output FY-2(prev)`)
FY[[3]] <- rbind(`DT_Output FY-3(1)`,`DT_Output FY-3(2)`)
FY[[4]] <- rbind(`DT_Output FY-4(1)`,`DT_Output FY-4(2)`,`DT_Output FY-4(prev)`)
FY[[5]] <- rbind(`DT_Output FY-5(1)`,`DT_Output FY-5(2)`)
FY[[6]] <- rbind(`DT_Output FY-6(1)`,`DT_Output FY-6(prev)`)
FY[[7]] <- rbind(`DT_Output FY-7(1)`,`DT_Output FY-7(2)`)
FY[[8]] <- rbind(`DT_Output FY-8(1)`,`DT_Output FY-8(prev)`)

rm(list=setdiff(ls(),"FY"))

for (a in 1:length(FY)){
  FY[[a]] <- subset(FY[[a]],FY[[a]]$MAIN!="NULL")
  FY[[a]] <- distinct(FY[[a]])
  FY[[a]] <- FY[[a]] %>% #part that does not work
    filter(!grepl('able', FY[[a]][8]))
}

#Thing that works:
FYTest <- FY[[1]]
FYTest <- FYTest %>% 
  filter(!grepl('able', FYTest$.periodenddate))

Also class(FY[[1]][8]) gives the following results: [1] "tbl_df" "tbl" "data.frame"

EDIT:

Code that worked:

FY <- excel_sheets(myPath) %>% 
  setdiff(c("Isin&Date", "Parameters", "TempSheet","Unique ISIN")) %>% 
  map(read_excel, path = myPath) %>% 
  bind_rows()

colnames(FY)<- c("ISIN","estEPS","date","calcDate","broker","fPeriod","rfPeriod","periodEnd","analystCode","anlystName")

FY <- FY %>% 
  subset(.,estEPS!="NULL" & periodEnd!="NULL" & !str_detect(periodEnd, "field")) %>% 
  distinct() %>% 
  mutate(., periodEnd = as.Date(as.numeric(periodEnd), origin = "1899-12-30"))
lprunier
  • 3
  • 2

1 Answers1

0

I don't really understand what your excel sheets are like or what you're trying to do, but it might be easier to load them all like this:

FY <- excel_sheets(myPath) %>% 
    setdiff(c("Isin&Date", "Parameters", "TempSheet","Unique ISIN")) %>% 
    map(read_excel, path = myPath) %>% 
    bind_rows()

This will give you one tibble with all of the data. Then you can filter out or fix the offending rows. After you've done that, you can mutate to convert the dates. As explained in this answer, you can use as.Date(43890, origin = "1899-12-30") to convert the dates.

Michael Dewar
  • 2,553
  • 1
  • 6
  • 22
  • I guess coming from the sheets in Excel I wanted to keep a similar structure and create one data frame per sheet which is a mistake. I will update my initial question to show the code I ended up using, thank you! – lprunier Dec 20 '21 at 16:00