In my global environment, I have variables that have the following names:
- filtered_A
- unfiltered_A
- filtered_B
- unfiltered_B
and so on...
The variable filtered_A is a subset of unfiltered_A that is a dataframe of one column containing gene names. I am trying to add a new column in unfiltered_A with two strings: "Passed" or "notPassed". "Passed" are those genes that exist in filtered_A. So basically I am building kind of a match between the two dataframes and write "Passed" or "notPassed" if they don't match.
I have wrote the following code:
```{r unfiltered}
setwd("/home/alaa/Documents/Analysis/genes/WES/unfiltered")
# list that contains sample names
vcfFiles <- list.files(getwd(), recursive = T)
for (i in vcfFiles) {
print(i)
assign(paste0("unfiltered_", i), read.table(i))
}
```
```{r filtered}
setwd("/home/alaa/Documents/Analysis/genes/WES/filtered")
for (i in vcfFiles) {
print(i)
assign(paste0("filtered_", i), read.table(i))
}
```
```{r matching}
for (i in vcfFiles){
y <- grep(i, ls())
filterd <- get(ls()[y[1]])
unfilterd <- get(ls()[y[2]])
name_filterd <- ls()[y[1]]
name_unfilterd <- ls()[y[2]]
assign(name_unfilterd, cbind(unfilterd, apply(unfilterd, 1, function(x) ifelse(any(x[1] == filterd), 'Passed','notPassed'))))
}
for (i in ls()){
if (is.data.frame(get(i)) && ncol(get(i)) == 2 && grepl(pattern = "unfiltered_", x = i)) {
print(i)
j <- get(i)
colnames(j)[2] <- "Situation"
assign(i, j)
}
}
#rm(i, j, filterd, unfilterd, name_filterd, name_unfilterd, y)
```
If I run this code, it will fail on the first time saying:
Error in apply(unfilterd, 1, function(x) ifelse(any(x[1] == filterd), :
dim(X) must have a positive length
I do understand that this is due to unfilterd being dimensionless. However, if I rerun this code, it works without any problem.
Can someone explain me what is wrong please and why is it failing on the first attempt?
In case you are wondering why I am working using the global environment and ls(), its because I have many dataframes to match.
Thanks in advance.