0

I have to run the following command for hundreds of .docx files in a directory in a windows in order to convert them to .txt.

java -jar tika-app-1.3.jar -t somedocfile.doc > converted.txt

I was wondering if there is any automatic way such as writing a ".bat" file to do this.

Mohammadreza
  • 450
  • 2
  • 4
  • 14

1 Answers1

1

Yes you can do it in batch. An example isn't coming to mind in order to help you but since you are running java commands anyway you can do it through java too. Here is an example of how you can be running the commands on CMD from java. Hope this helps.

import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;


try {
        executeCMD("java -jar tika-app-1.3.jar -t " + <YOUR_DOC_FILE> + " > " + <TXT_NAME_OF_FILE> );
    }catch(Exception e){
        e.printStackTrace();
    }


private void executeCMD(String command) throws Exception{
    Process process = Runtime.getRuntime().exec(command);
    printMsg(command + " stdout:", process.getInputStream());
    printMsg(command + " stderr:", process.getErrorStream());
    process.waitFor();
    if(process.exitValue() != 0)
        System.out.println(command + "exited with value " + process.exitValue());
}


private void printMsg(String name, InputStream ins) throws Exception {
    String line = null;
    BufferedReader in = new BufferedReader(new InputStreamReader(ins));
    while((line = in.readLine()) != null){
        System.out.println(name + " " + line);
    }
}

UPDATE

Okey here is the way simpler batch way which I couldn't think of yesterday :D

for /r %%i in (*.doc) do java -jar tika-app-1.3.jar -t %%~ni > %%~ni.txt

It reads all the *.doc files in the directory it is executed, also the "~n" is in order to list only the filenames ( in your example "somedocfile" ) because otherwise it will do something like "C://...../somedocfile.doc" and changing the extension might be annoying.

Wald
  • 1,063
  • 7
  • 13
  • thanks. although it was not the batch way, I think it definitely has a +1. Cause I decided to use java in this way for now. – Mohammadreza Jun 19 '15 at 04:27
  • @Mohammadreza Updated since I had a bit more time at work today :) – Wald Jun 19 '15 at 07:54
  • Thank you indeed Wald. I just changed your command a little bit to handle files with spaces in names: for /r %%f in (*.docx) DO java -jar tika-app-1.8.jar -t "%%~nf" > "%%~nf".txt-- Also I wanted to mention that the first "~ni" should be removed in the one that you mentioned. Thanks again – Mohammadreza Jun 21 '15 at 00:57