Getting specific word from doc file respective of uppercase/lowercase using python?

Question

I am getting following output : [[], [], ['Audi'], ['audi'], ['AuDi']]
But I want ['Audi','audi','AuDi']
my code is:

from docx import Document
document = Document(r'C:\Users\aliassample02.docx')
list1 = []
for para in document.paragraphs:
    results = re.findall(r'audi', para.text, re.IGNORECASE)
    list1.append(results)
print(list1)

jezrael · Accepted Answer · 2020-06-08T08:36:54.433

4

Use extend list instead append:

list1 = []
for para in document.paragraphs:
    results = re.findall(r'audi', para.text, re.IGNORECASE)
    list1.extend(results)

Or you can flatten values in list comprehension:

list1 = [x for para in document.paragraphs 
           for x in re.findall(r'audi', para.text, re.IGNORECASE)]

EDIT:

list1 = []
for para in document.paragraphs:
    for x in list2:
        results = re.findall(x, para.text, re.IGNORECASE)
        list1.extend(results)

edited Jun 08 '20 at 08:36

answered Jun 08 '20 at 08:08

jezrael

822,522
95
1,334
1,252

@kumaranuj - Can you be more specific? – jezrael Jun 08 '20 at 08:15
1

one more thing, if "audi" will be an iterable item then? Like i want to iterate so many items like list2 = ["audi","bmw"]. Then i need to apply for i in list2: results = re.findall(r'i', para.text, re.IGNORECASE) – Jun 08 '20 at 08:35
@kumaranuj - What should be output? both lists together? – jezrael Jun 08 '20 at 08:36
I have one docx file and i want to delete/replace the particular word in docs which i will pass through as list. list1 = ["aaa","bbb"] wherever this two elements will be there in docx , it will delete and replace as well irespective of uppercase/lowrcase. for delete operation i thought of using replacing it with "" empty string. and replacing i am using "replacingWord" – Jun 08 '20 at 08:55
@kumaranuj - I think the best is create new question. – jezrael Jun 08 '20 at 08:57
Ok. After 90 minutes, i will post it again. Thanks :) – Jun 08 '20 at 08:59
https://stackoverflow.com/questions/62259624/how-to-replace-any-word-in-ppt-using-python-respective-of-uppercase-lowercase @jezrael can you please check. – Jun 08 '20 at 10:06
@kumaranuj - I check it and not easy test, because there are no data for test, missing [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) – jezrael Jun 08 '20 at 10:19

score 2 · Answer 2 · answered Jun 08 '20 at 08:13

2

You can flatten the list after finding all things you want:

list1 = [item for sublist in list1 for item in sublist]

answered Jun 08 '20 at 08:13

random_and_unknown

25
1
6

score 0 · Answer 3 · answered Jul 02 '20 at 07:21

0

It worked for me:

list1 = []
for para in document.paragraphs:
    results = re.findall(r'audi', para.text, re.IGNORECASE)
    list1.extend(results)

answered Jul 02 '20 at 07:21

score 0 · Answer 4 · answered Jul 02 '20 at 07:44

0

list1 = [item for sublist in list1 for item in sublist]

This list comprehensive also works for me.

answered Jul 02 '20 at 07:44

score 0 · Answer 5 · answered Jul 02 '20 at 07:50

0

list1 = [x for para in document.paragraphs 
           for x in re.findall(r'audi', para.text, re.IGNORECASE)]

Best solution i have got for my query.

answered Jul 02 '20 at 07:50

Getting specific word from doc file respective of uppercase/lowercase using python?

5 Answers5