-3
 # Load-anje paketa
library(tidyverse)
library(tabulizer)
library(readxl)
library(lubridate)
library(patchwork)
library(dplyr)
library(XLConnect)
library(tidyverse)
library(lubridate)
library(xlsx)
library(openxlsx)
library(tools)  
    putanja <- "Z:/Radne Skupine/Reports/AK-ID"
        direktoriji <- list.dirs(putanja, full.names = TRUE,recursive = FALSE)
        direktoriji
        for (d in 1:length(direktoriji)){
          temp_dir = direktoriji[d]
          fileovi <- list.files(path=temp_dir,pattern = "\\.xlsx$|\\.xls$")
          if (file_ext(fileovi) == "xlsx") {
        
          #fileovi = fileovi %>% filter(file_ext(fileovi[fileovi!="\\.xls"]))
          for (f in 1:length(fileovi)){
            
            temp_file = paste(direktoriji[d],fileovi[f],sep="/")
            # IF 01.00
            temp1 <- read_excel(path = temp_file,sheet = "IF 01.00",range = "A4:C56")
            # IF 02.01
            temp2 <- read_excel(path = temp_file,sheet = "IF 02.00",range = "A4:C19")
            # IF 02.02
            temp3 <- read_excel(path = temp_file,sheet = "IF 02.00",range = "A28:C35")
            # IF 03.00
            temp4 <- read_excel(path = temp_file,sheet = "IF 03.00",range = "A4:C25")
            # IF 04.00
            temp5 <- read_excel(path = temp_file,sheet = "IF 04.00",range = "A3:C20")
            # IF 05.00
            temp6 <- read_excel(path = temp_file,sheet = "IF 05.00",range = "A4:C33")
            # IF 06.01
            temp7 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A6:E12")
            # IF 06.02
            temp8 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A16:P22")
            # IF 06.03
            temp9 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A26:E30")
            # IF 06.04
            temp10 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A34:J38")
            # IF 06.05
            temp11 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A43:E50")
            # IF 06.06
            temp12 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A55:J62")
            # IF 06.07
            temp13 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A67:E75")
            # IF 06.08
            temp14 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A79:G87")
            # IF 06.09
            temp15 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A92:C102")
            # IF 06.10
            temp16 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "B107:G110")
            # IF 06.11
            temp17 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A115:G129")
            # IF 06.12
            temp18 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A134:E138")
            # IF 06.13
            temp19 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A143:J147")
            
            # sklapanje
            temp <- rbind(temp1,temp2,temp3,temp4,temp5,temp6,temp7,temp8,temp9,temp10,temp11,temp12,temp13,temp14,temp15,temp16,temp17,temp18,temp19) #%>% mutate(datum=,subjekt=) 
            
            }
          
          }
        }

Hi, I am using this piece of code to make a procedure that goes through directories and reads .xlsx files. In directories there are also .xls files that I want to avoid. This piece of code do not get an error just warnings and they are all the same "the condition has length > 1 and only the first element will be used". Can you please help out?

I am trying to get the data from just .xlsx files and if there is such .xlsx file where there is no Sheet 'IF 01.00' I would like to skip it in the loop. So I would like to skip the .xls files and .xlsx files where is no Sheet 'IF 01.00'.

motoko
  • 1
  • 3
  • 1
    Without having access to any of what you're using (e.g. the vector you're iterating over), I'm guessing `fileovi` is a vector of more than 1 value. `if` calls aren't vectorized; they only make a single comparison, as the warning message says. `ifelse` is the vectorized version. Beyond that, without a [reproducible example](https://stackoverflow.com/q/5963269/5325862), it's unclear what the comparison is that you're trying to do – camille Jan 14 '22 at 15:20
  • Consider using `dplyr::if_else` instead of `if` and a `for`-loop – Baraliuh Jan 14 '22 at 15:26
  • @Baraliuh could you elaborate on the advantage here of `dplyr::if_else` over `base::ifelse`? – camille Jan 14 '22 at 15:35
  • You should be covered by posts [here](https://stackoverflow.com/q/48781138/5325862), [here](https://stackoverflow.com/q/14170778/5325862), and [here](https://stackoverflow.com/q/47034933/5325862), plus [lots of other questions](https://stackoverflow.com/search?q=%5Br%5D+is%3Aquestion+condition+has+length+%3E+1) with the same title as yours, but like I said, unclear what you're trying to compare – camille Jan 14 '22 at 15:41
  • @camille There are two main differences. (1) The dplyr version is faster than the base version. (2) The dplyr version is more stringent. That is, the type (e.g., integer, double, etc) returned by true has to be the same as false. This makes it more robust to weird/buggy behaviors (if you expect the same output that is). – Baraliuh Jan 14 '22 at 17:30
  • @camille The data that I am going through are directories with .xls and .xlsx files. What I am trying to do is to get the data inside only the .xlsx files. `fileovi` are files names and `direktoriji` are directories. The problem is, that I have both .xls and .xlsx files in directories. Do you maybe know how could I make this work? – motoko Jan 14 '22 at 17:48
  • If you're trying to check each file's extension, put that inside the loop that iterates over files – camille Jan 14 '22 at 17:51
  • @camille could you please be so kind and help me how could I do that? – motoko Jan 14 '22 at 17:55
  • Swap the order of `if (file_ext(fileovi) == "xlsx")` and `for (f in 1:length(fileovi))` – camille Jan 14 '22 at 18:06
  • @camille I still get the same warnings(), no error but no result. I do not get any data that is in those .xlsx files – motoko Jan 15 '22 at 15:52

1 Answers1

1

I tried to reproduce your example. I believe that the warning is raised whenever you have more than one .xlsx file in temp_dir. If so, then fileovi is a vector of strings, with as many elements as .xlsx files in temp_dir.

Thus, file_ext(fileovi) will be another vector of strings with the same number of elements. Passing it in the if condition raises the warning, since only the first element will be used by R to check the statement == "xlsx".

EDIT

In the comments, I suggestedd dropping |\\.xls$ in the definition of fileovi, so to look for .xlsx files only. However, it has been correctly pointed out that the loop would crash when operating over a folder with no such files.

To solve this, I suggest to modify the if condition before the inner loop, using if (!identical(fileovi, character(0))) in such a way that the code below runs only if we found at least one .xlsx file. Following, my proposal:

library(readxl)

putanja = "Z:/Radne Skupine/Reports/AK-ID"

direktoriji <- list.dirs(putanja, full.names = TRUE,recursive = FALSE)

for (d in 1:length(direktoriji))
{
  temp_dir = direktoriji[d]
  fileovi <- list.files(path=temp_dir,pattern = "\\.xlsx$") # LOOKING FOR .XLSX ONLY!
  if (!identical(fileovi, character(0))) # IF NO .XLSX FILE, SKIP THE FOLDER!
  { 
    for (f in 1:length(fileovi))
    {
      temp_file = paste(direktoriji[d],fileovi[f],sep="/")
      # IF 01.00
      temp1 <- read_excel(path = temp_file,sheet = "IF 01.00",range = "A4:C56")
      # IF 02.01
      temp2 <- read_excel(path = temp_file,sheet = "IF 02.00",range = "A4:C19")
      # IF 02.02
      temp3 <- read_excel(path = temp_file,sheet = "IF 02.00",range = "A28:C35")
      # IF 03.00
      temp4 <- read_excel(path = temp_file,sheet = "IF 03.00",range = "A4:C25")
      # IF 04.00
      temp5 <- read_excel(path = temp_file,sheet = "IF 04.00",range = "A3:C20")
      # IF 05.00
      temp6 <- read_excel(path = temp_file,sheet = "IF 05.00",range = "A4:C33")
      # IF 06.01
      temp7 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A6:E12")
      # IF 06.02
      temp8 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A16:P22")
      # IF 06.03
      temp9 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A26:E30")
      # IF 06.04
      temp10 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A34:J38")
      # IF 06.05
      temp11 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A43:E50")
      # IF 06.06
      temp12 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A55:J62")
      # IF 06.07
      temp13 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A67:E75")
      # IF 06.08
      temp14 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A79:G87")
      # IF 06.09
      temp15 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A92:C102")
      # IF 06.10
      temp16 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "B107:G110")
      # IF 06.11
      temp17 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A115:G129")
      # IF 06.12
      temp18 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A134:E138")
      # IF 06.13
      temp19 <- read_excel(path = temp_file,sheet = "IF 06.00",range = "A143:J147")
      
      # sklapanje
      temp <- rbind(temp1,temp2,temp3,temp4,temp5,temp6,temp7,temp8,temp9,temp10,temp11,temp12,temp13,temp14,temp15,temp16,temp17,temp18,temp19)
    }
  }
}
riccardo-df
  • 512
  • 4
  • 9
  • Thank you for trying out my code. Do you maybe know how to overcome this issue so I can just use .xlsx files in the `for`-loop? The problem is, that I have both .xls and .xlsx files in directories. – motoko Jan 14 '22 at 17:51
  • I see that in `fileovi <- list.files(path=temp_dir,pattern = "\\.xlsx$|\\.xls$")`, you are looking for both .xlsx and .xls files. Can't you just drop `|\\.xls$`? – riccardo-df Jan 14 '22 at 17:56
  • If I remove `|\\.xls$` then I get an error because some directories do not contain any .xlsx files – motoko Jan 15 '22 at 15:57
  • I edited my own answer, now everything should work (unluckily, I do not have your folders, so I couldn't check for unexpected bugs). – riccardo-df Jan 15 '22 at 18:02
  • Thank you! There are still some .xlsx files with different layout from what I am looking for, they do not contain `sheet = "IF 06.00",range = "A16:P22"`. So the next step would be to avoid those files. So to avoid .xlsx files that do not contain `sheet = "IF 06.00"` and so on. Would that be possible to do? Otherwise I get an error `Error: Sheet 'IF 01.00' not found` – motoko Jan 15 '22 at 18:43
  • Would it be possible to make an `if` condition if there is no `Sheet 'IF 01.00'` then skip this file? – motoko Jan 15 '22 at 18:50
  • Well, I would just use another `if` condition to check for the existence of the sheet you mentioned, in a spirit similar to the `if` I proposed in the answer. I do not have your folders, so I cannot proceed with coding, but you can try now! Ps consider editing the question to add these new details. – riccardo-df Jan 15 '22 at 19:20
  • Could you please give me an example how could I use the `if` and where in the code to look if there is a `Sheet 'IF 01.00'`? Thank you! – motoko Jan 17 '22 at 09:08
  • What do you mean? Do you need to skip the code if no such sheet is found? I am really sorry, but if you need more code please consider closing this question (by upvoting and accepting my answer) and opening a new one, providing some fake folder with all the characteristics you need to solve :) – riccardo-df Jan 17 '22 at 09:14
  • There are .xlsx files that I want to skip if there is no `Sheet 'IF 01.00'` in them and if there is such sheet in the .xlsx file then it is okay. I do not know how and where to put the `if` statement in the loop. – motoko Jan 17 '22 at 10:05
  • I edited the question so it is clear what I need. Thank you! – motoko Jan 17 '22 at 10:24
  • Immediately after `for (f in 1:length(fileovi)) {`, and wrap within that all the code in the loop. In this way, the code first selects the directory, then selects .xlsx files, if at least one is found, it loops over such files, and for each file skips the iteration if no sheet is found. Be sure that the condition returns `FALSE` if no sheet is found, and `TRUE` otherwise, in similar spirit to the `if` I proposed. – riccardo-df Jan 17 '22 at 10:25
  • Could you please help how would that `if` statement look like? How would it check for the existence of `Sheet 'IF 01.00'`? – motoko Jan 17 '22 at 10:32
  • 2
    I already told you. I am sorry, you really should at least try yourself, before asking. I already provided you with an example of how you should use your `if` condition, and where to put it. As I already pointed out, I do not have your folders, so I do not know how they look like, and you still did not provide a reproducible example. I also tried to fake some folders in my pc to understand how to help you, but now I am stuck since apparently I do not have the packages to run all of your code (which you should specify at the beginning). Last but not least, I did not even get an upvote from you. – riccardo-df Jan 17 '22 at 10:52
  • I am sorry, I gave you the upvote (I am new to this). Also, I have edited the question with the all the packages needed. I have tried to make the `if` statement but I am not sure how to code it so it only looks for `Sheet 'IF 01.00'`. – motoko Jan 17 '22 at 11:34
  • You have imported some packages twice, be careful. Anyway, I still cannot run `read_excel(path = temp_file,sheet = "IF 01.00",range = "A4:C56")`. It says that there is no `fansi` package. – riccardo-df Jan 17 '22 at 11:40
  • I have checked the `read_excel` is from `readxl` package. You think I should add the `fansi` package as well? – motoko Jan 17 '22 at 11:49
  • I do not know, I tried to install the `fansy` package but unsuccessfully. – riccardo-df Jan 17 '22 at 11:58
  • Google search showed that there is no `fansy` but `fansi` package. I will add it to the question. – motoko Jan 17 '22 at 12:53
  • Sorry, that was a typo, I meant `fansi`. Anyway, the issue is still there, I cannot install the package. – riccardo-df Jan 17 '22 at 13:27
  • I would like to use another way to make an `if` condition without looking at the existence of `Sheet 'IF 01.00'`. How could I rewrite this `if (substr(fileovi,8,13)>="2021-3")` so I do not get the warning `the condition has length > 1 and only the first element will be used`? – motoko Jan 17 '22 at 15:11
  • That really depends on where you are putting this condition. If it lies within the inner `for` loop, just index `fileovi` by using `fileovi[f]`, so to scan single .xlsx files. – riccardo-df Jan 17 '22 at 15:18