I'm a beginner in R, and right I'm working in datatables. So following is my problem. I have a datatable (ls) which looks like this, and in the future it can x number of subfolders, and I have written code till 3 number of subfolders:
V1 nsubfolders
1: CCData/EQHazGIS/Eqattcc.dat 2
2: TWData/HUVuln/ea/modifsWI5.tw 3
3: TWData/HUVul/ea/pa/modifsWI8.tw 4
4: TWData/HUVul1/ea/pa/lk/modifsWI9.tw 5
The nsubfolders would go to x number.So basically the first columns shows the path where the files are kept, and the second columns tells how many numbers of folders you have to travel to reach the file, and there can be x number of subfolders, for now in the data it is till 5.For example if nsubfolder is 2 , that I would have to travel through 2 subfolders in the directory to reach the file. Now, my part is extract name of each subfolder (from the first column $V1) i.e data before and after '/' and put it into new appended columns depending upon the number of subfolders. So the first new column should have first string before '/', the second new column should have the second string after '/', the 3 new column would have string before the second '/'..and so on. The o/p should look like this :
V1 nsubfolders v2 v3 v4 v5 v6 v7
1: CCData/EQHazGIS/Eqattcc.dat 2 CCdata EQHazGIS NA NA NA dat
2: TWData/HUVuln/ea/modifsWI7.tw 3 TWData HUVuln ea NA NA TW
3: TWData/HUVul/ea/pa/modifsWI8.tw 4 TWData HUVul ea pa NA TW
4: TWData/HUVul1/ea/pa/lk/modifsWI9.tw 5 TWData HUVul1 ea pa lk TW
and for the file name, I only need the extension of the file i.e. (dat, or CN), and that has to be the last column.If the data is not there it should show NA. I have written the following code to check till 3 subfolders but the data is not coming correct, in col v4 and V8 'NA' is displayed, and in till 2 subfolders the o/p in v2 and v3 is correct, but after that it starts repeating itself.So once the code will work for till 3 folders, I can continue to do till x number of subfolders. Even after studying a lot I', not able to figure it out why, and, I know the following is code is not the best and efficient way to do it, please see the following code and let me know your valuable suggestions and efficient way to it :
#Global variables
filesPath <-"//ca1ntap01/Transfer/2Anuj/Data/" # this is directory where all the folders are kept
#creating a datatable
require(data.table)
ls<-as.data.table(list.files(filesPath,recursive=T,all.files = T,full.names = F
,include.dirs = F))
e <- character()
f <- character()
g <- character()
j <- character()
n <- character()
o <- character()
p <- character()
for(i in 1:nrow(ls) )
{
ls$nsubfolders<-sapply(regmatches(ls$V1, gregexpr("/", ls$V1)), length) #this gives the number of subfolders for every row
a <- ls[i,1]
print(a)
b <- read.table(text = toString(a), sep = "/", as.is = TRUE)$V1
print(b)
c <- read.table(text = toString(a), sep = "/", as.is = TRUE)$V2
print(c)
e[i] <- b
f[i] <- c
#if (ls[nsubfolders=="2"])
if (ls[ls$nsubfolders== 2])
{
d <- read.table(text = toString(a), sep = "/", as.is = TRUE)$V3
print(d)
g[i] <- d
h <- read.table(text = toString(d), sep = ".", as.is = TRUE)$V2
print(h)
print(i)
j[i] <- h
} else if (ls[ls$nsubfolders== 3]) {
k <-read.table(text = toString(a), sep = "/", as.is = TRUE)$V3
l <-read.table(text = toString(a), sep = "/", as.is = TRUE)$V4
m <-read.table(text = toString(l), sep = ".", as.is = TRUE)$V2
n[i] <- k
o[i] <- l
p[i] <- m
}
}
# to add columns in the datatable
ls$V2 <- e
ls$V3 <- f
ls$V4 <- n
ls$V5 <- g
ls$V6 <- j
ls$V7 <- o
ls$V8 <- p
print(ls)