0

I have thousands of txt files (1.txt; 2.txt; 3.txt...) to be used as input (predictions), and another file called "labels". I need to run few commands to create their respective outputs (AUC values). I am using the suggestion in a previous post (Looping through all files in directory in R, applying multiple commands).

But I am having trouble in creating my function to be included in this loop.

My original code (for 1 file predictions):

library(ROCR)
labels <- read.table(file="/data/labels/labels", header=F, sep="\t")
predictions <- read.table(file="/data/input/3.txt", header=F)
pred <- prediction(predictions, labels)
perf <- performance(pred,"tpr","fpr")
auc <- attr(performance(pred ,"auc"), "y.values")
auc
write.table(auc, "/data/out/AUC3.txt",sep="\t")

My code so far (not working):

library(ROCR)
labels <- read.table(file="/data/labels/labels", header=F, sep="\t")

files <- list.files(path="/data/input/", pattern="*.txt", full.names=TRUE, recursive=FALSE)

auc <- function(r) {
    pred <- prediction(files, labels)
    perf <- performance(pred,"tpr","fpr")
    auc <- attr(performance(pred ,"auc"), "y.values")
}

lapply(files, function(x) {
    t <- read.table(x, header=F) # load file
    out <- auc(t)
    write.table(out, "/data/out/", sep="\t")
})

Error message:

Error in prediction(files, labels) :
Number of predictions in each run must be equal to the number of labels for each run.
Calls: lapply -> FUN -> auc -> prediction
Execution halted
Alex
  • 355
  • 1
  • 7
  • 20
  • Just a few quick comments: 1. your `auc` function does not return anything. 2. The auc function does not get passed all the objects it uses - this is a really questionable style, because it gets these objects from other environments. There should be really good reasons to do it like that. 3. If you ask me - try it in a loop. Using an `apply` actually doesn't help you here (performance) and a loop is way easier to debug. You can actually see, where the error occurs (maybe it's a particular of the files. – Georgery Jun 02 '20 at 15:19

1 Answers1

2

The problem is this statement files$V1. files is created by list.files and that function returns an atomic vector (see ?list.files). You cannot use $ with atomic vectors. You have to address the element using a numeric index files[###] with ### being the correct index.

Jan
  • 4,974
  • 3
  • 26
  • 43
  • thanks for the suggestion. Can you please give me an example? Assuming that I have 300 files, then do I need to use [c(1:300)]? Where I shoud place the index? – Alex Jun 01 '20 at 21:34
  • With `files$V1` you address one specific position (most likely the first). I am beginning to sense that it is not what you intend. Your actual intention is not clear enough to me that I can give you a specific example. I can only speculate that you shouldn't call `$V1` but pass the file on to the `auc` function as a second argument. – Jan Jun 01 '20 at 22:23
  • Actually I do not need $V1 in my code. My files have different rows and just 1 column. Sorry that I was not very clear. – Alex Jun 02 '20 at 14:56