0

Full disclosure- I inherited this code and tried to Frankenstein it enough to make it work. It isn't perfect.

I have a series of Excel workbooks I'm iterating through to extract financial data for a group of medical practices. The workbooks have a tab for each month. I used lapply() to iterate over the sheets to pull only the months in each quarter. One of the practices only has data from January and February of 2022 so I wouldn't expect that to show up for the 4th quarter update we just ran. However, that data is there.

library(tidyverse)
library(readxl)
library(openxlsx)

df1 <- data.frame("Medication" = seq(1:50),
              "Total WAC" = seq(51:100))

df2 <- data.frame("Medication" = seq(1:50),
              "Total WAC" = seq(51:100))

list_of_datasets <- list("January" = df1, "February" = df2)
write.xlsx(list_of_datasets, file = "C:/MC_report.xlsx")

current_month <- lubridate::month(as.Date(Sys.Date(), format = "%Y/%m/%d"))
current_year <- lubridate::year(as.Date(Sys.Date(), format = "%Y/%m/%d"))
Q1 <- c("January", "February", "March")
Q2 <- c("April", "May", "June")
Q3 <- c("July", "August", "September")
Q4 <- c("October", "November", "December")

quarter <- switch(current_month, 
              "1" = Q4, "2" = Q4, "3" = Q4,
              "4" = Q1, "5" = Q1, "6" = Q1,
              "7" = Q2, "8" = Q2, "9" = Q2,
              "10" = Q3, "11" = Q3, "12" = Q3)

year <- ifelse(current_month %in% c(1, 2, 3), current_year - 1, current_year)

names = c("Medication", "Total WAC")
MCPath22 = "C:/MC_report.xlsx" 
MClist22 = lapply(quarter, function(x){ # this function is repeated for each practice. I won't paste it over and over
      dat = read_excel(MCPath22, sheet = x, skip = 1)[c(1,2)] # 1 is 'Medication' 2 is "Total WAC'
      names(dat) = names
      dat$Month = x
      dat$Year = year
      dat$Location = "Medical Center"
      return(dat)
      })

MC_newdata = do.call(rbind, MClist22) %>% 
  select( Medication, `Total WAC`, Month, Year, Location) %>%
  mutate(Date.Added = Sys.time())

data = rbind(MC_newdata, DHP_newdata, Lex_newdata, Derm_newdata, Onc_newdata, oldvalues) %>% 
   filter(!is.na(Medication)) #includes all the practices

write_csv(data,"PAP Data.csv")

I just ran this again and all facilities save for the one with only January and February tabs are running correctly. It throws an error that 'October" not found, which is expected. I can stop that piece in R Studio and the script completes. And then Jan and Feb are in the output. Any idea why it's outputting the wrong data?

JKA99
  • 9
  • 2
  • Can you make your post [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? We can't access `MCPath22`. – jrcalabrese Feb 15 '23 at 18:58
  • I can add a random path in the code. MCPath22 is just the path to a specific file. – JKA99 Feb 15 '23 at 19:06
  • We can't access that path or the files in that path either. It will be difficult to figure why your code is outputting the wrong data when we can't access the data itself. Can you create a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) within your post, possibly by using built-in datasets (e.g., `iris`, `mtcars`) or by generating random data if you can't provide all your data using `dput()`? – jrcalabrese Feb 15 '23 at 19:10
  • I don't know how to do that with a multi-tab xlsx workbook. The sheets are my problem. the data coming from each of them is fine. I'm getting data from sheets I expect to be ignored. I couldn't find a workable example of a conditional for checking the sheets prior to reading them. – JKA99 Feb 15 '23 at 19:15
  • You can [create multi-tab xlsx workbooks by using the `openxlsx` package](https://stackoverflow.com/a/36435337/14992857) in R. – jrcalabrese Feb 15 '23 at 19:24
  • "quarter" is which fiscal quarter to place the run in. The calculated quarter has the corresponding 3 months in a vector. And the results are reported out as quarterly numbers. Since this script only needs to be run 4 times a year, i wanted it to calculate the previous quarter from today. Not much gets automated. That is another problem for another day. – JKA99 Feb 15 '23 at 20:16
  • The error message that you keep getting is `Error: Sheet 'October' not found`, it's doing that because there is no October sheet in `list_of_datasets`. It doesn't sound like your goal is to resolve that error message though. However, you say that the output is incorrect because it contains January and February; when you say output here, are you referring to `MC_report.xlsx`? It only has that because `list_of_datasets` only contains January and February. Am I misunderstanding anything? – jrcalabrese Feb 15 '23 at 20:25
  • As far as I can tell, because of the `switch` you do for `quarter`, your function is looking for things (`"October" "November" "December"`) that aren't present in `MC_report.xlsx`. Because this function fails, no action is taken to change `MC_report.xlsx`, so it continues to contain the sheets January and February, which you assigned it when making `list_of_datasets`. – jrcalabrese Feb 15 '23 at 20:30
  • You are correct that I'm not trying to resolve the error, per se. My hope was to find a conditional statement that would skip that file entirely if those sheets didn't exist. I tried some if statements prior to the function, but couldn't get that to work. I don't understand why it's writing anything when those months aren't part of the vector sent to the read_excel() function. – JKA99 Feb 15 '23 at 20:36
  • If I understand your code correctly, your code stops entirely at `MClist22 = lapply...`, so it's not that your code is including January and February erroneously, it's that January and February are the *only* months in `MC_report.xlsx` and your code stops working at `MClist22 = lapply...` so it does not change `MC_report.xlsx` at all; thus, `MC_report.xlsx` continues to include January and February. If you *did* include sheets for October, November, and December, your code would work perfectly. – jrcalabrese Feb 15 '23 at 21:13
  • I'm not doing a great job explaining this. Maybe I just I don't know how to reduce the code enough to fully display the issue w/o being a mess. There are 5 total files, 4 of which all have 12 months of data. My problem file only has 2. My hope is it wouldn't output anything, yet it does. I understand why there is an error. The file is being asked for information that doesn't exist. My hope with this question was to find a way to bypass these odd files and allow the script to be run without updating it each time with either a custom time frame for the odd man out or when someone changes a file. – JKA99 Feb 16 '23 at 14:52

0 Answers0