I have many files, (the NYTimes corpus for '05, '06, & '07) , I want to run them all through the Stanford NER, "easy" you might think, "just follow the commands in the README doc", but if you thought that just now, you would be mistaken, because my situation is a bit more complicated. I don't want them all outputted into some big jumbled mess, I want to preserve the naming structure of each file, so for example, one file is named 1822873.xml
and I processed it earlier using the following command:
java -mx600m -cp /home/matthias/Workbench/SUTD/nytimes_corpus/stanford-ner-2015-01-30/stanford-ner-3.5.1.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -textFile /home/matthias/Workbench/SUTD/nytimes_corpus/1822873.xml -outputFormat inlineXML >> output.curtis
If I were to follow this question, i.e. many files all listed in the command one after the other, and then pipe that to somewhere, wouldn't it just send them all to the same file? That sounds like a headache disastor of the highest order.
Is there some way to send each file to a seperate output file, so for instance, our old friend 1822873.xml
would emerge from this process as, say 1822873.output.xml
, and likewise for each of the other thousand some odd files. Please keep in mind that I'm trying to achieve this expeditiously.
I guess this should be possible, but what is the best way to do it? with some kind of terminal command, or maybe write a small script?
Maybe one among you has some experience with this type of thing.
Thank you for your consideration.