I'm currently trying to pull pdf's with the following list of text. I was able to pull pdf's but with only one word. should i change my condition below? thanks in advance. newbie here.
from tika import parser
import glob
path = glob.glob(r"C:\Users\kxdane\Desktop\TEST\OKED\*.pdf")
for path in path:
pdf_files = glob.glob(path)
text = (['Disclosure','M.D.'])
for file in pdf_files:
raw = parser.from_file(file)
if text in raw['content']:
print(file)`