1

I'm trying to run a command line argument on a directory full of files. The files are named by numbers in ascending order.

1815837.xml
1815838.xml
1815839.xml
1815840.xml

Would it be possible to write some kind of script to take all the files in the directory and one by one feed them through the following command (the Stanford NER):

java -mx600m -cp /home/matthias/Workbench/SUTD/nytimes_corpus/NER/stanford-ner-2015-01-30/stanford-ner-3.5.1.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier /home/matthias/Workbench/SUTD/nytimes_corpus/NER/stanford-ner-2015-01-30/classifiers/english.all.3class.distsim.crf.ser.gz -textFile 1815838.xml -outputFormat inlineXML >> 1815838_output.xml

That code that I'm invoking there outputs the result to the console, so I'm piping it to a specially named file, i.e. >> 1815838_output.xml It's important that I maintain that naming convention.

Is it feasible to run that code on every file in a directory and save the output accordingly with a short java program or a bash script? What would it look like?

This question is tangentially related to a previous inquiry.

My hazy notion is something like this:

*X* = '1815838'

while(still files in directory)
{
   java -mx600m -cp stanford-ner-2015-01-30/stanford-ner-3.5.1.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier english.all.3class.distsim.crf.ser.gz -textFile *X*.xml -outputFormat inlineXML >> *X* + '_output.xml'

X--

}

In my mind, that works, but I don't know if that's a real thing or if it would work in real life, I googled and didn't find anything like that, but maybe I didn't know exactly what to ask. Is this reasonable? Can someone maybe show me the way?


UPDATE

-rwxr-xr-x 1 matthias matthias 3.8K Apr 10 20:35 1815851.xml*
-rw-r--r-- 1 matthias matthias 4.6K Apr 12 16:25 1815851_output.xml
-rw-r--r-- 1 matthias matthias 5.3K Apr 12 16:25 1815851_output_output.xml
-rwxr-xr-x 1 matthias matthias 3.3K Apr 10 20:35 1815852.xml*
-rw-r--r-- 1 matthias matthias 4.5K Apr 12 16:25 1815852_output.xml
-rw-r--r-- 1 matthias matthias 5.6K Apr 12 16:25 1815852_output_output.xml
-rwxr-xr-x 1 matthias matthias 2.5K Apr 10 20:35 1815853.xml*
-rw-r--r-- 1 matthias matthias 2.9K Apr 12 16:25 1815853_output.xml
-rw-r--r-- 1 matthias matthias 3.3K Apr 12 16:25 1815853_output_output.xml
-rwxr-xr-x 1 matthias matthias 2.4K Apr 10 20:35 1815854.xml*
-rw-r--r-- 1 matthias matthias 2.7K Apr 12 16:25 1815854_output.xml
-rw-r--r-- 1 matthias matthias 2.9K Apr 12 16:25 1815854_output_output.xml
-rwxr-xr-x 1 matthias matthias 2.8K Apr 10 20:35 1815855.xml*
-rw-r--r-- 1 matthias matthias 3.6K Apr 12 16:25 1815855_output.xml
-rw-r--r-- 1 matthias matthias 4.4K Apr 12 16:26 1815855_output_output.xml

without the loop, but also, curiously, nothing written to output

g="$(1816001.xml $f .xml)_output.xml"
java -mx600m -cp /home/matthias/Workbench/SUTD/nytimes_corpus/NER/stanford-ner-2015-01-30/stanford-ner-3.5.1.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier /home/matthias/Workbench/SUTD/nytimes_corpus/NER/stanford-ner-2015-01-30/classifiers/english.all.3class.distsim.crf.ser.gz -textFile $f -outputFormat inlineXML > $g
Community
  • 1
  • 1
smatthewenglish
  • 2,831
  • 4
  • 36
  • 72

1 Answers1

1

That's easily done: Assuming your current directory is where the files are:

for f in *.xml ; do
    echo $f | grep -q '_output\.xml$' && continue # skip output files
    g="$(basename $f .xml)_output.xml"
    command a_lot_of_arguments $f more_arguments >> $g
done

Though I wonder whether you want >> or > for redirection. The former will append to the output file if it already exists, for example from a previous run of the same script. The latter will overwrite it.

Abhay
  • 768
  • 4
  • 13
  • it seems to be calling the code properly, but the output files are all empty – smatthewenglish Apr 12 '15 at 08:28
  • Try it just for one file, replacing $f and $g by actual names, and see if the file is still empty. I mean without the loop. – Abhay Apr 12 '15 at 08:30
  • in the original question under update i posted what was happening, it was kind of like a runaway freight train, like I guess it would have just kept going in the way of 1_output_output_output_output.xml, you know what I mean? – smatthewenglish Apr 12 '15 at 08:34
  • Added a line to skip output files. Now before trying it, please remove all the superfluous files: `rm -f *_output_output.xml` – Abhay Apr 12 '15 at 08:39
  • i'll do that right away. but I've tried it with no loop, i ran what i have just posted to the update all the way at the bottom, and still there was nothing written to output, is what i posted there what you had in mind? – smatthewenglish Apr 12 '15 at 08:43
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/75045/discussion-between-abhay-and-s-matthew-english). – Abhay Apr 12 '15 at 08:43
  • is there a way to easily generalize that script where it will apply that to a directory substructure? such as, there's a directory `01`, and within it a directory `01` full of files, but also `02` it'self full of files and so on, is there a way to start that script once and have it apply to all directories in that way? – smatthewenglish Apr 12 '15 at 09:19
  • @S.Matthew_English Let's get into the chat room once more! – Abhay Apr 12 '15 at 09:31