1

getData does not work for me. I can't find anything about it in any of the documentation, and I keep getting the error 'ArrayObject' object has no attribute 'getData' when I try to use it on an indirect object.

from tkinter import *
from tkinter import ttk
from tkinter import messagebox
from tkinter import filedialog
import PyPDF2 
from PyPDF2 import filters
from PyPDF2 import generic
from PyPDF2 import merger
from PyPDF2 import pagerange
from PyPDF2 import utils
from PyPDF2 import xmp

root = Tk()
frm = ttk.Frame(root, padding=300)
frm.grid()
ttk.Label(frm, 
text="TestingTesting123").grid(column=10, row=9)
ttk.Button(frm, text="Quit", 
command=root.destroy).grid(column=10, row=10)
_Loader = filedialog
_File = _Loader.askopenfile()
_Reader = PyPDF2.PdfFileReader(stream=_File.name)
_Page = _Reader.getPage(0)
_Output = messagebox
_Output.showinfo("Test",_Page['/Contents'].getData())
root.mainloop()

So, everything is just fine right up until I call the getData method. If I take it out, _Page['/Contents'] returns what appears to be a two-dimensional array object {IndirectObject[71,0]}.

All I want to do is be able to see what's in that array, or at least one index. I call the getData method, and I get the error. Also, when I assign the _Page variable, I don't get anything suggested in PyCharm when I type "_Page." which, if it's a page object, I should, right? Do I not have something imported correctly, maybe? No, I can't share the .pdf I'm working on. Wish I could. Also, is there any documentation of PyPDF that actually mentions or covers things like getData or resolvedObjects?

Giorgos Xou
  • 1,461
  • 1
  • 13
  • 32
  • 1
    Does this answer your question? [pyPdf for IndirectObject extraction](https://stackoverflow.com/questions/436474/pypdf-for-indirectobject-extraction) – Giorgos Xou Apr 09 '22 at 13:32

1 Answers1

0

I am reading https://github.com/py-pdf/PyPDF2/issues/72

page = PdfFileReader(inpdf).getPage(0)

text = page.getContents()[n].getData() # where n is an index to locate the indirectObject

You may find that changing

... .showinfo("Test", _Page['/Contents'].getData())

to

... .showinfo("Test", _Page['/Contents'][0].getData())

is winning.

J_H
  • 17,926
  • 4
  • 24
  • 44
  • Yeah, this was a dumb question. I saw that last week, but had no idea what dictionaries were, and didn't see that there was a single-element array keyed to '/Contents' and you'd need to specify the index even if it was just one element. Then, I somehow fumbled my way into doing it this morning. This is the right answer, though. – Matthew LaClair Apr 11 '22 at 18:34