Area of coding: PDF Table of Contents in python3 using pyPDF2
Problem: I need a program that can iterate through a union variable that contains multiple dictionaries, then multiple lists which contains multiple dictionaries.
[
{},
[{}, {}, {}],
{},
[{}, {}, {}],
{},
[{}, {}, {}]
]
This pattern repeats multiple times.
Expected output: The output should look like this
1 Title Goes Here
1.1 Title Goes Here
1.1.1 Title Goes Here
1.1.2 Title Goes Here
1.1.3 Title Goes Here
1.2 Title Goes Here
1.2.1 Title Goes Here
1.2.2 Title Goes Here
1.2.3 Title Goes Here
1.3 Title Goes Here
1.3.1 Title Goes Here
1.3.2 Title Goes Here
1.3.3 Title Goes Here
2 Title Goes Here
2.1 Title Goes Here
2.1.1 Title Goes Here
2.1.2 Title Goes Here
2.1.3 Title Goes Here
2.2 Title Goes Here
2.2.1 Title Goes Here
2.2.2 Title Goes Here
2.2.3 Title Goes Here
2.3 Title Goes Here
2.3.1 Title Goes Here
2.3.2 Title Goes Here
2.3.3 Title Goes Here
Program:
import argparse as arp
from PyPDF2 import PdfFileReader
parser = arp.ArgumentParser()
parser.add_argument("-f", "--file", help="File to analyse")
arg = parser.parse_args()
filename = arg.file
def fileread():
doc = PdfFileReader(filename)
ToC = doc.getOutlines()
# ToC: Union[List[Union[Destination, list]], {__eq__}] = doc.getOutlines()
for elements in ToC:
#print(elements)
#print("\n")
try:
if elements is {}: # If the element is a dictionary just find the Title
print(elements['/Title']) # TODO: This is just skipped
else: # If the element is a list go through and print out the titles
for nest_dict in elements:
try:
print(nest_dict["/Title"])
except:
continue
except:
continue
fileread()
I'm testing this program on: Compilers - Principles, Techniques, and Tools-Pearson_Addison Wesley (2006).pdf
Any help is much appreciated.