I'm not able to apply find and replace function on a word file through python

Question

import os
import sys
import fileinput

print ("Text to search for:")
textToSearch = input( "> " ) 

print ("Text to replace it with:")
textToReplace = input( "> " )

print ("File to perform Search-Replace on:")
fileToSearch  = input( "> " )
#fileToSearch = 'D:\dummy1.txt'

tempFile = open( fileToSearch, 'r+' , encoding="utf8")

for line in fileinput.input( fileToSearch ):
    if textToSearch in line :
        print('Match Found')
    else:
        print('Match Not Found!!')
    tempFile.write( line.replace( textToSearch, textToReplace ) )
tempFile.close()

input( '\n\n Press Enter to exit...' )

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 643: character maps to im getting this message as an error — Rishabh Sharma, Mar 22 '19 at 13:11
Any particular reason for using `fileinput` and not just `open` ? — han solo, Mar 22 '19 at 13:13
yea,im designing it for generic purpose,its complex though,i have to read through excel and manipulate the data for a docx file — Rishabh Sharma, Mar 22 '19 at 13:14
Possible duplicate of [UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to ](https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character) — srattigan, Mar 22 '19 at 13:19

user2622016 · Accepted Answer · 2019-03-22T13:19:23.620

0

Looks like you're opening a binary file (if so, save it as a plain text), or a file with not matching encoding (utf-8).

If you need to work with docx document, you need some specialized library for opening and reading, for example python-docx.

edited Mar 22 '19 at 13:19

answered Mar 22 '19 at 13:14

user2622016

6,060
3
32
53

but i need to match the pattern with a docx file content,its working fine for text files but throwing errors for docx – Rishabh Sharma Mar 22 '19 at 13:16
docx is a binary file, so default python functions for working with text files can't read it. Use for example library python-docx, or similar. – user2622016 Mar 22 '19 at 13:20
actually im a beginner,please provide with the reference code if you can :) – Rishabh Sharma Mar 22 '19 at 13:25

Subsum44 · Answer 2 · 2019-03-22T13:30:48.630

0

You're getting the 0x8f error because there is a character in there that is not a unicode character. Check how the text file is saved in notepad, it might be ANSI not UTF-8.

Also, I would do a couple things differently.

First use re.search instead of just in. You'll get better results, and if you wanted to add more granularity later such as whole words only, it's easy to update.

Second, use a real Excel library like openpyxl, and the same for docx like docx (that's the name of the library). They're rendered as plain text to us by the editors, but they're stored as larger encoded files. Trying to work through them with fileinput without treating them as such is going to get messy. You can choose which library to use based on the filename, so you still have re-usability, but you're now using the right tool for the job.

edited Mar 22 '19 at 13:30

answered Mar 22 '19 at 13:28

Subsum44

36
6

im quite familiar with docx,so all i need to do is modify the open fucntion,and that'll do right/ – Rishabh Sharma Mar 22 '19 at 13:30
Ok cool, definitely look at adding an Excel library though. I also just realized I missed something about how the file is saved and edited the response. It might be saved as ANSI by default, hence the 0x8f error. – Subsum44 Mar 22 '19 at 13:32
i'll land up at the same place then – Rishabh Sharma Mar 22 '19 at 13:33

I'm not able to apply find and replace function on a word file through python

2 Answers2