1

I Have a Folder of Many .docx Files

Folder of Word Documents

and each .docx file contains many medical terms.

I Want To index all of them to search faster.

for Example:-

I Have a Folder named examplefolder

I have a 100 Word Document Files with .docx extension

Each File contains 1000 + Words

I Want To Search a Word Across all the 100 Files !!

Is There a python Code to Do Like That ?

  • the ```python-docx``` module might help, have a look: https://pypi.org/project/python-docx/ (how to download), https://python-docx.readthedocs.io/en/latest/ (documentation = how to use it) – sputnick567 Jul 25 '21 at 10:59
  • I Think I Cannot Index the Word Files to Search Fast if I use this Library – Hacker--Rohan Raj Jul 25 '21 at 11:03
  • well i guess its gonna take a long time anyway to search these files. https://stackoverflow.com/questions/24805671/how-to-use-python-docx-to-replace-text-in-a-word-document-and-save if you still plan to use the library the accepted answer in the link above could help coding a search function for the document. – sputnick567 Jul 25 '21 at 14:32
  • It is For Find and Replace. I Want to search a tern if it exists in any word document in folder. and in single folder there are thousands of word files – Hacker--Rohan Raj Jul 26 '21 at 05:18
  • you can just use the find function for finding the word. also there are functions from the module ```os``` which allow you to see all the files in a folder – sputnick567 Jul 26 '21 at 11:18
  • it is taking so much time if i am using the os find. instead i want a program that indexes all the words in the word document ad makes my search faster – Hacker--Rohan Raj Jul 27 '21 at 10:26

1 Answers1

0

You can do this in the terminal by using Antiword to convert the doc files into a grep-able format and then grep on its output-

antiword *.docx | grep "your-medical-term" 
iamakhilverma
  • 564
  • 3
  • 9