Extracting different data from one datatable column and inserting into other increasing columns in R

Question

I'm a beginner in R, and right I'm working in datatables. So following is my problem. I have a datatable (ls) which looks like this, and in the future it can x number of subfolders, and I have written code till 3 number of subfolders:

                            V1           nsubfolders
  1: CCData/EQHazGIS/Eqattcc.dat            2
  2: TWData/HUVuln/ea/modifsWI5.tw          3
  3: TWData/HUVul/ea/pa/modifsWI8.tw        4
  4: TWData/HUVul1/ea/pa/lk/modifsWI9.tw    5

The nsubfolders would go to x number.So basically the first columns shows the path where the files are kept, and the second columns tells how many numbers of folders you have to travel to reach the file, and there can be x number of subfolders, for now in the data it is till 5.For example if nsubfolder is 2 , that I would have to travel through 2 subfolders in the directory to reach the file. Now, my part is extract name of each subfolder (from the first column $V1) i.e data before and after '/' and put it into new appended columns depending upon the number of subfolders. So the first new column should have first string before '/', the second new column should have the second string after '/', the 3 new column would have string before the second '/'..and so on. The o/p should look like this :

                             V1  nsubfolders v2    v3      v4  v5 v6 v7
  1: CCData/EQHazGIS/Eqattcc.dat           2 CCdata EQHazGIS NA NA NA dat 
  2:  TWData/HUVuln/ea/modifsWI7.tw        3 TWData HUVuln   ea NA NA TW
  3:  TWData/HUVul/ea/pa/modifsWI8.tw      4 TWData HUVul    ea pa NA TW  
  4:  TWData/HUVul1/ea/pa/lk/modifsWI9.tw  5 TWData HUVul1   ea pa lk TW

and for the file name, I only need the extension of the file i.e. (dat, or CN), and that has to be the last column.If the data is not there it should show NA. I have written the following code to check till 3 subfolders but the data is not coming correct, in col v4 and V8 'NA' is displayed, and in till 2 subfolders the o/p in v2 and v3 is correct, but after that it starts repeating itself.So once the code will work for till 3 folders, I can continue to do till x number of subfolders. Even after studying a lot I', not able to figure it out why, and, I know the following is code is not the best and efficient way to do it, please see the following code and let me know your valuable suggestions and efficient way to it :

#Global variables

filesPath <-"//ca1ntap01/Transfer/2Anuj/Data/" # this is directory where all the folders are kept

#creating a datatable
require(data.table)

ls<-as.data.table(list.files(filesPath,recursive=T,all.files = T,full.names = F
                             ,include.dirs = F))

e <- character()
f <- character()
g <- character()
j <- character()
n <- character()
o <- character()
p <- character()

for(i in 1:nrow(ls) )

{
  ls$nsubfolders<-sapply(regmatches(ls$V1, gregexpr("/", ls$V1)), length) #this gives the number of subfolders for every row
  a <- ls[i,1]
  print(a)
  b <- read.table(text = toString(a), sep = "/", as.is = TRUE)$V1
  print(b)
  c <- read.table(text = toString(a), sep = "/", as.is = TRUE)$V2
  print(c)
  e[i] <- b
  f[i] <- c
  #if (ls[nsubfolders=="2"])
  if (ls[ls$nsubfolders== 2])  
    {
    d <- read.table(text = toString(a), sep = "/", as.is = TRUE)$V3
    print(d)
    g[i] <- d
    h <- read.table(text = toString(d), sep = ".", as.is = TRUE)$V2
    print(h)
    print(i)
    j[i] <- h

  } else if (ls[ls$nsubfolders== 3]) {

    k <-read.table(text = toString(a), sep = "/", as.is = TRUE)$V3
    l <-read.table(text = toString(a), sep = "/", as.is = TRUE)$V4
    m <-read.table(text = toString(l), sep = ".", as.is = TRUE)$V2
    n[i] <- k
    o[i] <- l
    p[i] <- m
  }

}




# to add columns in the datatable
ls$V2 <- e
ls$V3 <- f
ls$V4 <- n
ls$V5 <- g
ls$V6 <- j
ls$V7 <- o
ls$V8 <- p

print(ls)

Welcome to StackOverflow! Rather than showing just a little of what it looks like, please use a [reproducible](https://stackoverflow.com/q/5963269/1422451) example per the [MCVE](https://stackoverflow.com/help/mcve) and [`r`](https://stackoverflow.com/tags/r/info) tag description, with the desired output. You can use `dput()`, `reprex::reprex()` or built-in data sets for reproducible data. — Hack-R, Jul 20 '18 at 02:38
@Hack-R, thanks ! ,but I have not shown just a little, infact, I have shown everything as far as the code is concerned, and for datatable cannot show the entire table, because it contain a huge number of data. I have written the input and the desired output and what I have done and what is required. Do let me know what is missing? — NewCoder, Jul 20 '18 at 03:35
The important thing is to provide it in a **reproducible** format. Don't just copy and paste what it looks like. — Hack-R, Jul 20 '18 at 13:22

Onyambu · Answer 1 · 2018-07-23T03:52:41.647

0

if you have to count the nsubfolders:

ext=sub(".*[.]","",dat$V1)
 dat1=read.table(text=sub("[^/]*$","",dat$V1),sep="/",fill=T,na.strings = "")
 nsubfolders=rowSums(!is.na(dat1))
 cbind(dat[-2],nsubfolders,dat1,ext)
                                   V1 nsubfolders     V1       V2   V3   V4   V5 ext
1         CCData/EQHazGIS/Eqattcc.dat           2 CCData EQHazGIS <NA> <NA> <NA> dat
2         CCData/EQHazGIS/eqcrhaz2.CN           2 CCData EQHazGIS <NA> <NA> <NA>  CN
3       TWData/HUVuln/ea/modifsWI5.tw           3 TWData   HUVuln   ea <NA> <NA>  tw
4     TWData/HUVul/ea/pa/modifsWI8.tw           4 TWData    HUVul   ea   pa <NA>  tw
5         CCData/EQHazGIS/eqcrhaz2.CN           2 CCData EQHazGIS <NA> <NA> <NA>  CN
6 TWData/HUVul1/ea/pa/lk/modifsWI9.tw           5 TWData   HUVul1   ea   pa   lk  tw

edited Jul 23 '18 at 03:52

answered Jul 20 '18 at 04:04

Onyambu

67,392
3
24
53

@NewCoder which answer? – Onyambu Jul 23 '18 at 01:18
@NewCoder I have run the code on the data and it produces the results you want. Try running again. – Onyambu Jul 23 '18 at 03:52
@Onyambu...its there a way I can share the data with you, because its not running for my data? – NewCoder Jul 25 '18 at 04:20

Extracting different data from one datatable column and inserting into other increasing columns in R

1 Answers1