I need to extract text from PDF file and make a new .txt file to put in

Question

I need help in a PYTHON script to read PDF file and copy every word on it and put them in a new .txt file (every word must take 1 line) ; and then deleted the repeated words and count them after that and print the count in the last line

score 0 · Answer 1 · answered Apr 23 '19 at 11:43

0

Did you search the Stackoverflow for answers?

Here you can find some pretty good answers about how to extract text from a pdf file (Look at Jakobovski answer): How to extract text from a PDF file?

Here you can find information about writing/editing/creating .txt files: https://www.guru99.com/reading-and-writing-files-in-python.html

answered Apr 23 '19 at 11:43

Dylan_w

472
5
19

I didn't find what I want , if you know how to write the script can you write it please ? – AbdulRhman Fawzy Apr 26 '19 at 01:49

score 0 · Accepted Answer · answered Apr 23 '19 at 11:48

0

Install these libraries.

PyPDF2 (To convert simple, text-based PDF files into text readable by Python)

textract (To convert non-trivial, scanned PDF files into text readable by Python)

nltk (To clean and convert phrases into keywords)

Each of these libraries can be installed with the following commands in side terminal(on macOS):

pip install Libraryname

See this Tutorial https://medium.com/@rqaiserr/how-to-convert-pdfs-into-searchable-key-words-with-python-85aab86c544f

Use texttrack it support many types of files also PDF. So texttrack better.

folow these links

https://github.com/deanmalmgren/textract

https://textract.readthedocs.io/en/latest/

answered Apr 23 '19 at 11:48

MIH

125
1
14

I can't .. Can you write it yourself please ? – AbdulRhman Fawzy Apr 26 '19 at 01:48
Abdul Rhaman simply open command prompt write cd and give path of script folder then just write pip install textrace and enter your textrace libaray will start installation – MIH Apr 26 '19 at 15:07
Which python version you are using? – MIH Apr 26 '19 at 15:08
Study these link that i provide above they will solve your problem Inshallah. – MIH Apr 26 '19 at 15:10

I need to extract text from PDF file and make a new .txt file to put in

2 Answers2