-1

I need help in a PYTHON script to read PDF file and copy every word on it and put them in a new .txt file (every word must take 1 line) ; and then deleted the repeated words and count them after that and print the count in the last line

2 Answers2

0

Did you search the Stackoverflow for answers?

Here you can find some pretty good answers about how to extract text from a pdf file (Look at Jakobovski answer): How to extract text from a PDF file?

Here you can find information about writing/editing/creating .txt files: https://www.guru99.com/reading-and-writing-files-in-python.html

Dylan_w
  • 472
  • 5
  • 19
0

Install these libraries.

PyPDF2 (To convert simple, text-based PDF files into text readable by Python)

textract (To convert non-trivial, scanned PDF files into text readable by Python)

nltk (To clean and convert phrases into keywords)

Each of these libraries can be installed with the following commands in side terminal(on macOS):

pip install Libraryname

See this Tutorial https://medium.com/@rqaiserr/how-to-convert-pdfs-into-searchable-key-words-with-python-85aab86c544f

Use texttrack it support many types of files also PDF. So texttrack better.

folow these links

https://github.com/deanmalmgren/textract

https://textract.readthedocs.io/en/latest/

MIH
  • 125
  • 1
  • 14