I'm looking for a way to to open a pdf in chrome, select all, and copy the contents to write to a text file. I understand this is a very hacky approach, but I've tried pdftotext
and textract
libraries for reading pdf text already, and manually doing select all and copy/paste in chrome has read text in my multiple files most consistently.
This is what I have so far:
import os
import subprocess
# open file in chrome
cmd = """osascript -e 'tell application "System Events" to keystroke "a" using {command down}'"""
p = subprocess.Popen(['open', '-na', 'Google Chrome', '--args', '--new-window', f'{pdf_f}'])
time.sleep(1)
# select all
os.system(cmd)
time.sleep(1)
# copy
cmd = """osascript -e 'tell application "System Events" to keystroke "c" using {command down}'"""
os.system(cmd)
This visibly looks to work, opening the pdf in chrome then showing all of the text selected, but the text isn't being copied. I can't tell if its from the copy command or when the new chrome window opens, the focus is on the window and not on the pdf file within the window.