I was wondering if there is any way using Tika/Python to only parse the first page or extract the metadata from the first page only? Right now, when I pass the pdf, it is parsing every single page. I looked that this link: Is it possible to extract text by page for word/pdf files using Apache Tika? However, this link explains more in java, which I am not familiar with. I was hoping there could be a python solution for it? Thanks!
from tika import parser
# running: java -jar tika-server1.18.jar before executing code below.
parsedPDF = parser.from_file('C:\\path\\to\\dir\\sample.pdf')
fulltext = parsedPDF['content']
metadata_dict = parsedPDF['metadata']
title = metadata_dict['title']
author = metadata_dict['Author'] # capturing all the names from lets say 15 pages. Just want it to capture from first page
pages = metadata_dict['xmpTPg:NPages']