I have a script that lists the annotations of a PDF file Parse annotations from a pdf:
import popplerqt5
import argparse
def extract(fn):
doc = popplerqt5.Poppler.Document.load(fn)
annotations = []
for i in range(doc.numPages()):
page = doc.page(i)
for annot in page.annotations():
contents = annot.contents()
if contents:
annotations.append(contents)
print(f'page={i + 1} {contents}')
print(f'{len(annotations)} annotation(s) found')
return annotations
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('fn')
args = parser.parse_args()
extract(args.fn)
But it only works for text annotations, there are a lot of Python libraries like Poppler, PyPDF2, PyMuPDF, and I've been searching their documentations and source codes a lot and as far as I'm concerned, they are not able to extract the binary of sound annotations. Do you know any library that can do this? I need to extract the binaries of these sound annotations and convert them to MP3's.