0

@r2evans
[1] Tembec Inc. Tembec Inc. Tembec Inc. 197 Levels: National Bank of Canada Fonds de placement du Barreau du Québec ... Irving Resources Inc.

> dest
[1] "C:\\Sedar_data\\2016\\2016_02\\balance-sheets"
> myfiles
 [1] "C:\\Sedar_data\\2016\\2016_02\\balance-sheets/02427367-00000007-00038001-i@#SLH#Sedar2#Western#FINALPROSPECTUS-PDF-REGFILE.xml"                           
 [2] "C:\\Sedar_data\\2016\\2016_02\\balance-sheets/02439236-00000002-00026641-C@#SEDAR#2016#First_Quarter_Report-PDF-REGFILE.xml"                              
C@#SEDAR#FILINGS#151231Ironside_Q2FS_cm-PDF-REGFILE.xml"                         
 [9] "C:\\Sedar_data\\2016\\2016_02\\balance-sheets/02440535-00000001-00002159-s@#SEDAR#Firan#Annual#AnnualReport-2015-FTG-PDF-REGFILE.xml"                     
[10] "C:\\Sedar_data\\2016\\2016_02\\balance-sheets/02440536-00000001-00002159-s@#SEDAR#Firan#Annual#MD-A-2015-FTG-PDF-REGFILE.xml"                             

Suppose that these are the input files. If I want to output Firan then the output would be [11] "C:\Sedar_data\2016\2016_02\balance-sheets/02440538-00000001-00002159-s@#SEDAR#Firan#Annual#AFS-2015-FTG-PDF-REGFILE.xml"

I am reading multiple XML files from my computer via

dest <- "C:\\my_data\\2016\\2016_02"

Out of that XML files I output the tags. I am filtering the files according to name of the companies.

look.for <- c( "Technology Group Corporation") 
name_filter <- filesList_df[filesList_df$`names_try[1, 1]`  %in% look.for ,]               name_filter

It outputs as

 > name_filter
[1]Technology Group Corporation Firan Technology Group Corporation Firan Technology Group Corporation
[4] Technology Group Corporation Firan Technology Group Corporation Firan Technology Group Corporation
[7] Technology Group Corporation Firan Technology Group Corporation
197 Levels:Bank  ...  Resources Inc.

However, I actually would like to output the path of these files. Could you please help me how I can do it, thanks in advance.

The full code is

library(XML)
library(methods)
library(plyr)
library(stringr)
###############
#Parsing the files will be beneficial ##########
#d1 <- "C:\\Users\\DSLGuest\\Desktop\\Data\\2016/2016_03/2016-03-16/02455279-00000001-00001297-C@#Temp#BORALEX#2016#aNNUALfILINGS#MDA#MDAeng-PDF-REGFILE.xml"
#doc1 <- xmlParse(d1)
#doc1
##########################################################################
dest <- "C:\\Sedar_data\\2016\\2016_02"
myfiles <- list.files(path = dest, recursive=TRUE, pattern = "xml", full.names = TRUE)
filesList_df <- data.frame(File=character(), stringsAsFactors=FALSE)
for (i in myfiles){
result <- xmlParse(i)
#print(result)
rootnode <- xmlRoot(result)
#print(rootnode)
rootsize <- xmlSize(rootnode)

#print(rootnode[[15]][[1]][[2]]) }   #GIVES the NAME_of_the_company

names_try <- (ldply(xmlToList(rootnode[[15]][[1]][[2]]), data.frame ))

filesList_df <- rbind(filesList_df, as.data.frame(names_try[1,1]))
filesList_df
look.for <- c( "Firan Technology Group Corporation")
name_filter <- filesList_df[filesList_df$`names_try[1, 1]`  %in% look.for ,] 
name_filter

}
r2evans
  • 141,215
  • 6
  • 77
  • 149
Chrisxx
  • 7
  • 3
  • Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). In this case, *reproducible* might include *actual code*, not just these few lines; it might include a portion of `filesList_df`, though based on these few lines of code, I have no idea if it will be useful. I hope the code provides significantly more context, otherwise your question is too vague and broad and will be closed shortly. – r2evans Mar 18 '17 at 08:27
  • @r2evans ok, I added the full code as well. Thx. – Chrisxx Mar 18 '17 at 09:35
  • Do you mean something like `sub("[^\\\\/]*$", "", myfiles)`? – r2evans Mar 18 '17 at 14:07
  • r2evans I want to get the file paths of the outputs. – Chrisxx Mar 18 '17 at 18:22
  • *What outputs?!?* Nothing in your code suggests where you are saving anything to an external file. Here's a suggestion: based on *sample data*, provide your *expected output*. – r2evans Mar 18 '17 at 18:25
  • @r2evans I am not saving yes you are right. I am FILTERING so I have the same file path. I Want to get each of the output files path only I DO NOT WANT TO SAVE – Chrisxx Mar 18 '17 at 19:26
  • I have no idea what you are trying to do. You have input files (`myfiles`), for which you know the full path and file names. You are not saving any data, yet you want to know the path to output files. I must be missing something rather big here, this all appears contradictory. – r2evans Mar 18 '17 at 21:04
  • @r2evans Every single input files (myfiles) has its own path... C:// 02440414-00000001-00013815-C@##FILINGS is one of the path that a specific file has. I am filtering (myfiles) according to given company name. Suppose that if I have 450 files in (myfiles) as I filter them according to a specific company name now I would have as an output 23 files and their individual paths... – Chrisxx Mar 19 '17 at 03:05
  • Check out [`dirnames`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/basename.html) and [`sub`/`gsub`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html). If you try those and they provide something other than you are seeking, then I suggest again that you provide *sample data* and *expected output* based on that data. – r2evans Mar 19 '17 at 05:03
  • @r2evans I commited an edit on the question. – Chrisxx Mar 19 '17 at 05:24

0 Answers0