0

I would like to convert tables in PDF to Excel. I realize Adobe Acrobat Pro has this functionality - I would just like to program this because I have many files.

Subhobroto's reply in this post explains how to do this in python (just subbing xls for word), but it's for a Windows version of python. What would be the analogous way to connect to Adobe Acrobat using Python3 on a Mac?

def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext):
    avDoc = Dispatch("AcroExch.AVDoc") #  this is the line I need to sub with a method from another module

    # Open the input file (as a pdf)
    ret = avDoc.Open(f_path, f_path)   
    assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise?
matsuo_basho
  • 2,833
  • 8
  • 26
  • 47
  • The link in your question is broken. – abarnert Jul 30 '18 at 23:59
  • But at any rate, that seems to be using COM automation, which doesn't exist on any platform but Windows. There's no way to port that; you'd need a completely different implementation instead. – abarnert Jul 31 '18 at 00:00
  • I think Mac versions of Acrobat and Excel are AppleScriptable (at least they used to be…). If so, you could write something kind of similar with [`appscript`](http://appscript.sourceforge.net/), or with `NSAppleScript` or `ScriptingBridge` via pyobjc, or with `osascript` via a subprocess, etc. But you'll probably need to write it yourself; I doubt you'll find any up-to-date code that you can copy and paste. And that means you'll need to learn the basics of `ScriptEditor` and AppleScript dictionaries and so on to figure out what the right scripting calls are, so you can translate to Python. – abarnert Jul 31 '18 at 00:04
  • 1
    If you really don't want to learn AppleScript, you can try using `appscript` or `ScriptingBridge` in the Python REPL to explore what methods the app provides. Also, I found [this blog post](https://xprepres.wordpress.com/2013/05/25/dealing-acrobat-applescript/) that may help you get started. – abarnert Jul 31 '18 at 00:06
  • @abarnert, I'm fine with learning AppeScript for this particular task - doesn't seem like it would be that difficult. However, it appears the documentation is scant. https://stackoverflow.com/questions/41714452/applescript-to-save-pdf-as-pdf-x-in-acrobat Is there really no way to just do this in Python without importing the win32.com library? – matsuo_basho Jul 31 '18 at 05:01
  • `win32com` is Windows-specific. The way to do it on Mac is through one of the AppleScript-based mechanisms mentioned above. And yeah, the documentation is pretty scarce. Apple used to push AppleScript as a technology pretty heavily, and the had a brief resurgence from around 2008-2012, but over the last half decade they've basically ignored it, so, with a few exceptions, app developers haven't advertised their scripting features very much. (COM automation doesn't fare much better on Windows, to be honest.) – abarnert Jul 31 '18 at 05:07
  • Is it possible to just do it in Javascript on a Mac? – matsuo_basho Jul 31 '18 at 05:25
  • I think there is an ObjC bridge for Javascript, but it's not going to be any easier than PyObjC for Python. – abarnert Jul 31 '18 at 05:26
  • Would I need to learn Objective C to use PyObjC? Where would I start? Hmm, it really does seem the easiest solution is to go to a PC (which I have access to) and just use the code provided in the original link – matsuo_basho Jul 31 '18 at 05:32
  • 1
    If you're just doing `ScriptingBridge`, you really only need one line of PyObjC to get hold of the `SBApplication` and after that it's pure Python/SB code except in really complicated cases, and if you're just doing `NSAppleScript` it's just four lines of unchanging PyObjC boilerplate and after that it's pure AppleScript code. For anything beyond that, you need at least enough ObjC to be able to understand and translate simple ObjC code, but I don't think you'll need anything beyond that. – abarnert Jul 31 '18 at 05:39
  • 1
    To give you an idea of how much ObjC you need, see [this `ScriptingBridge` sample I slapped together](https://gist.github.com/abarnert/e2be24dfa4429680d30ee3c0414221c3). Basically, if you can copy and paste line 7 and replace the string literal, you know enough ObjC to hack on that script. – abarnert Jul 31 '18 at 05:45

0 Answers0