Deleting a specific word in doc file using python

Question

I have to delete particular word in docx file so i am using logic replace with ""(empty string) I am sharing my code:

def docx_replace_regex(doc_obj, regex, replace):
    for p in doc_obj.paragraphs:
        if regex.search(p.text):
            inline = p.runs
            # Loop added to work with runs (strings with same style)
            for i in range(len(inline)):
                if regex.search(inline[i].text):
                    text = regex.sub(replace, inline[i].text)
                    inline[i].text = text
    for table in doc_obj.tables:
        for row in table.rows:
            for cell in row.cells:
                docx_replace_regex(cell, regex, replace)

regex1 = re.compile(r"sign")
replace1 = r""
filename = r"C:\...\sample01.docx"
doc = Document(filename)
docx_replace_regex(doc, regex1, replace1)
doc.save('sample01.docx')

Now the issue i am facing is that suppose we have a word "Design" in docx file and i have given "sign" as replacing word, so "Design" is changing to "De" which is ideally not correct. Can any one help me out.

You can use `\b` in a python regex to express a "word boundary" i.e. that it matches a whitespace, the start or end of the string. So try to wrap your regex in `\b`'s - `\bsign\b` to only match whole words. — MatsLindh, Jun 24 '20 at 06:14
Does this answer your question? [Regex match entire words only](https://stackoverflow.com/questions/1751301/regex-match-entire-words-only) — Melebius, Jun 24 '20 at 06:15
@Melebius yes it worked. Thanks so much sir. I have one more small query. If i have sentence like this Design>Design!Design# Design$Design% Design^Design&Design*Design(Design) Design_ Design- Design+ Design= Design[Design] Design{ Design} Design\ Design| Design; Design: Design’ Design” > Design, Design. Design/ Design? Design` Design~ ---> How i can put them all in list like ["Design","Design",...........,"Design","Design"] — Anuj, Jun 24 '20 at 06:36
@Anuj Thanks for your feedback. Please [mark the link as the solution by clicking “That solved my problem!”](https://meta.stackexchange.com/a/250930/217657) to help others, too. And don’t add follow-up questions to your question or a comment, you can [ask a new question](https://stackoverflow.com/questions/ask) instead (after you do the research whether the solution has not been on the internet already). — Melebius, Jun 24 '20 at 06:46

score 0 · Answer 1 · answered Jun 24 '20 at 06:39

It worked !! This is the code .

def docx_replace_regex(doc_obj, regex, replace):
    for p in doc_obj.paragraphs:
        if regex.search(p.text):
            inline = p.runs
            # Loop added to work with runs (strings with same style)
            for i in range(len(inline)):
                if regex.search(inline[i].text):
                    text = regex.sub(replace, inline[i].text)
                    inline[i].text = text
    for table in doc_obj.tables:
        for row in table.rows:
            for cell in row.cells:
                docx_replace_regex(cell, regex, replace)


regex1 = re.compile(r"\bsign\b")
replace1 = r""
filename = r"C:\Users.........\sample01.docx"
doc = Document(filename)
docx_replace_regex(doc, regex1, replace1)
doc.save('sample01.docx')

Do I see correctly that you only changed one line (`regex1 = re.compile(r"\bsign\b")`)? A [good answer](https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/) should only contain the changed line along with the explanation. It is superfluous to repeat the code posted already. — Melebius, Jun 24 '20 at 06:59

Deleting a specific word in doc file using python

1 Answers1