I am trying to get page level ASCII text out of a series of multi-page PDFs. My current process is to split all of the PDFs with Sejda (an awesome tool) in batch and then extract text from the divided PDFs (in Sejda as batch) to corresponding text files. Is there an easy way to bypass the splitting phase and go straight to the page-level TXT files? I would like to just input a collection of multi-page PDFs and OUTPUT a corresponding TXT files for each page of each PDF. Any input or insight would be appreciated.
My process
File.pdf --> File-001.pdf; File-002.pdf; etc. --> File-001.txt; File-002.txt; etc