0

I know similar questions have been asked before, but I cannot understand how I can solve this error or find any solution in previous posts.

CTCF_intersect_files<-list.files(paste(intersect_bed_path,"CTCF/",sep=""))
length(CTCF_intersect_files) #762

CTCF_intersection=lapply(CTCF_intersect_files, 
function(x) {

#Load in "x" file skipping empty files
t_CTCF=if (!file.size(x) == 0) {
read.table(x, header=FALSE)
}
})

Error in if (!file.size(x) == 0) { :
  missing value where TRUE/FALSE needed
VLG
  • 25
  • 4
  • 1
    `x` as a file does not exist. I suggest you test with `file.exists(x)`. Note that `file.size("file_does_not_exist.txt")` is `NA`, then see https://stackoverflow.com/q/7355187/3358272. – r2evans Jan 11 '21 at 16:38
  • 1
    If you created `CTCF_intersect_files` with `list.files`, perhaps you should add `list.files(..., full.names=TRUE)` so that the path is included as well (relative or absolute, depending on how you called it). – r2evans Jan 11 '21 at 16:39
  • 1
    Third option: `if (isTRUE(file.size(x) > 0)) read.table(...)`, though this is not going to resolve the underlying problem that you are expecting a file to exist and it does not. – r2evans Jan 11 '21 at 16:40
  • 1
    I believe the error comes from `!file.size(x) == 0`. This negates `file.size`, not equality to zero. – Rui Barradas Jan 11 '21 at 16:44
  • I see your point, Rui, but I don't think that that mis-logic is producing the error. See `!1==0` and `!0==0` ... then see `!NA==0` (when the file does not exist or some other file-based error, such as permissions). – r2evans Jan 11 '21 at 16:47
  • @r2evans OK, I tested the logic with a vector output by `list.files`, so a 0 bytes file didn't throw an error... – Rui Barradas Jan 11 '21 at 16:49
  • @r2evans. I have edited my answer to include the part of the script where I import all my files and I get the full names of all my files but the error remains – VLG Jan 11 '21 at 17:00
  • VLG, and you missed my recommendation to use `full.names=TRUE`. – r2evans Jan 11 '21 at 18:04
  • 1
    If you look at `CTCF_intersect_files`, you'll see file names. In my first comment, I suggested checking for file existence. What does `file.exists(CTCF_intersect_files[1])` return? If you look for one of those file names, you'll see that **they do not exist in the current directory**. R is not going to *infer* (frankly, no language will) that when you say *"read file `quux.txt`"*, you reammy mean *"read the `quux.txt` that was found in a subdirectory I provided to a different function call one or more expressions ago"*. – r2evans Jan 11 '21 at 18:08
  • @r2evans. OK I see, Problem solved! Thank you – VLG Jan 11 '21 at 18:20
  • `s/reammy/really/g` in my previous comment ¯\\_(ツ)_/¯ – r2evans Jan 11 '21 at 18:21

1 Answers1

1

Here are two fixes for your code.

  1. You are looking in a subdirectory for files, but the default action (as bad as it is) is to return just the file names, not the full path required to actually access those files. For instance,

    list.files(path = "dirname")
    # [1] "file1"   "file2"
    list.files(path = "dirname", full.names = TRUE)
    # [1] "dirname/file1"   "dirname/file2"
    

    So however you are calling list.files, just add full.names = TRUE and you will resolve the fact that none of your files will exist. This resolves the problem you do not yet know you have, but is fixing the real cause for the error you see.

  2. Your test for file.size if flawed in that trying to read a file that is not found will return NA. When I have a problem with code, my troubleshooting technique is to actually try each of the sub-components to see what is broken. In your case, since the error fails with the if statement, I would try to run each of the components.

    I'm going to guess that if you did that, it would look something like:

    (!file.size(x) == 0)
    # [1] NA
    file.size(x) == 0
    # [1] NA
    file.size(x)
    # [1] NA
    

    and now that you know of the function file.exists, you might then try

    x
    # [1] "somefile.txt"
    read.table(x)
    # Warning in file(file, "rt") :
    #   cannot open file 'somefile.txt': No such file or directory
    # Error in file(file, "rt") : cannot open the connection
    file.exists(x)
    # [1] FALSE
    

Taking those notes and adjusting your code, I suggest:

CTCF_intersect_files <- list.files(paste(intersect_bed_path, "CTCF/", sep = ""),
                                   full.names = TRUE)
CTCF_intersection <- lapply(
  CTCF_intersect_files, 
  function(x) {
    if (isTRUE(file.size(x) > 0)) {
      read.table(x, header=FALSE)
    }
  })

We don't need to actually call file.exists(x) in this case, since isTRUE(file.size(x) > 0) will correctly handle the times when x does not exist.

However, I would find it annoying to go through this and get no indication. This quick check will give you assurances that this assumption is met:

stopifnot(all(file.exists(CTCF_intersect_files))
CTCF_intersection <- lapply( ... )
r2evans
  • 141,215
  • 6
  • 77
  • 149