How to edit the text using docx package

Question

Objective: I want to read the text from a word file and then increment the ascii value of each character by some predefined number(sort of encoding) and save it into the same file itself. For eg : 'A' has an ascii of 65 so I need that to become 75. I'm writing this following code and is stuck at it at this point. `

import docx
from docx import Document
data = Document("C:\Python27\Testing.docx")
for n in data.paragraphs:
    temp= n.text
for d in temp:
    try:
        temp1 = str(temp)
    except UnicodeEncodeError:
        temp1 = temp.encode('ascii','replace')
        pass
print temp1

Now the output which I get is like this

This is just a test of what I?m gonna make. Fingers crossed?

and the original string is

This is just a test of what I’m gonna make. Fingers crossed…

how can I replace the Unicode characters with the corresponding ascii characters so that I can proceed ahead? Please provide some suggestions.

What do you mean by replacing Unicode with Ascii characters? These two are completely different ... — linusg, Nov 25 '16 at 12:35
When i'm type casting it as string the Unicode characters are not getting converted into string and it gives and UnicodeEncodeError. So I want to convert that characters also into string characters. — NISHIT KHARA, Nov 25 '16 at 12:38
You can't have unicode chars in an ascii string that are not in the ascii codec! (or I do not understand what you want to achieve...) — linusg, Nov 25 '16 at 12:42
The temp variable in the code is of Unicode datatype. I want to convert it to string datatype. Now when I typecast using `str(temp)` ,some characters like the single quotes (in my example) are not getting casted. So is there any way to cast such characters?. — NISHIT KHARA, Nov 25 '16 at 12:47
what do you get if you do a temp1 = temp.decode() in the exception block? — themistoklik, Nov 25 '16 at 12:48
I commented the encode and statement and did temp1 = temp.decode()`Traceback (most recent call last): File "C:\Python27\Testing_1.py", line 16, in temp1 = temp.decode() UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 29: ordinal not in range(128)` — NISHIT KHARA, Nov 25 '16 at 12:50
I tried the utf-8 instead of ascii. I got the following result `This is just a test of what Iâ€™m gonna make. Fingers crossedâ€¦ ` — NISHIT KHARA, Nov 25 '16 at 12:54
is that result in the terminal or in a file? you could have not set the sys encoding to utf-8 — themistoklik, Nov 25 '16 at 13:01
My default encoding is in ascii. I checked that using `sys.getdefaultencoding()` and I have read that we should not play with the setdefaultencoding(). Also it has been removed from the sys. — NISHIT KHARA, Nov 25 '16 at 13:48
A docx, fundamentally has Unicode text. You should revisit/refine your requirements. A Ceaser cipher is best defined to transform only certain alphabets and pass every other character through unchanged. The [Basic Latin](http://www.unicode.org/charts/nameslist/index.html) uppercase and lowercase letters for example. (I don't know Python but maybe you should explore whether Python 3 has better support for Unicode.) — Tom Blodget, Nov 25 '16 at 16:33

How to edit the text using docx package

0 Answers0