Any way to open pdf in chrome, select all, copy, and paste/write to file?

Question

I'm looking for a way to to open a pdf in chrome, select all, and copy the contents to write to a text file. I understand this is a very hacky approach, but I've tried pdftotext and textract libraries for reading pdf text already, and manually doing select all and copy/paste in chrome has read text in my multiple files most consistently.

This is what I have so far:

import os
import subprocess

# open file in chrome
cmd = """osascript -e 'tell application "System Events" to keystroke "a" using {command down}'"""
p = subprocess.Popen(['open', '-na', 'Google Chrome', '--args', '--new-window', f'{pdf_f}'])
time.sleep(1)
# select all
os.system(cmd)
time.sleep(1)
# copy
cmd = """osascript -e 'tell application "System Events" to keystroke "c" using {command down}'"""
os.system(cmd)

This visibly looks to work, opening the pdf in chrome then showing all of the text selected, but the text isn't being copied. I can't tell if its from the copy command or when the new chrome window opens, the focus is on the window and not on the pdf file within the window.

The extra hop of copying into chrome doesnt seem very efficient. Have you evaluated other python pdfs libraries such as `PyPDF2` and the `PdfFileReader` class? https://pypi.org/project/PyPDF2/#description. Also, other helpful answers may be here: https://stackoverflow.com/questions/34837707/how-to-extract-text-from-a-pdf-file — user9074332, Jan 20 '19 at 21:38
Yeah I tried those too, but unfortunately they weren't reading the text in a consistent manner with my files. I tried opening with a few different apps and chrome copied the text in the best way for me to parse the text later with regex, so decided to go that route. — PL3, Jan 21 '19 at 02:34

score 2 · Answer 1 · answered Jan 21 '19 at 02:32

Found a way:

for fnm in fnms:
    pdf_f = path/'data'/'pdfs'/f'{fnm}'
    # open file in chrome
    p = subprocess.Popen(['open', '-na', 'Google Chrome', f'{pdf_f}'])
    time.sleep(1)
    # click
    pyautogui.moveTo(screen_width//2, screen_height//2)
    pyautogui.click()
    # select all
    pyautogui.hotkey('command', 'a')
    # copy
    pyautogui.hotkey('command', 'c')
    # write txt file
    clipboard_to_txt(path/'data'/'txts'/(fnm[:-3]+'txt'))
    # close tab
    pyautogui.hotkey('command', 'w')

Any way to open pdf in chrome, select all, copy, and paste/write to file?

1 Answers1