-2

I didnt find anything for this question in stackoverflow, so please be patient with me and i didnt get any idea to work this out, please bear with me.

Below is my code:

v_doc 

for root, dirs, files in os.walk(paths):
    for t in files:
        if t.endswith('.xlsx'):   
            v_doc.append(Document(t))

            # say like, there are 3 docx which contains simple sentences. how to put 
            #those sentences into seperate dataframe columns for each docx sentences ? i have many docx. n number of docx

example docx:

docx1 contains:

Hello guys how are you all, hope you guys doing good.

docx2 contains:

I dont know what to write here

docx3 contains:

We are strong together ! do we ?

expected output:

dataframe:
column1                                                 column2
#Hello guys how are you all, hope you guys doing good.  #I don't know what to write here
column3
#We are strong together ! do we ?

hope i get some response. Thank you in advance.

Titan
  • 244
  • 1
  • 4
  • 18

1 Answers1

1

Gotchya:

import os
import docx

dataframe = {}

def get_files(extension, location):
    v_doc = []

    for root, dirs, files in os.walk(location):
        for t in files:
            if t.endswith(extension):   
                v_doc.append(t)
    return v_doc

file_list = get_files('.docx', '.')
index = 0
for file in file_list:
    index += 1
    doc = docx.Document(file)
    column_label = f'column{index}'
    data_content = doc.paragraphs[0].text
    dataframe = {column_label: data_content}

print(dataframe)
Bilal Qandeel
  • 727
  • 3
  • 6
  • {'column1': 'contents of example1.docx', 'column2': 'contents of example2.docx', 'column3': 'contents of example3.docx'} – Bilal Qandeel Jul 11 '21 at 20:02
  • doc.paragraphs[0].text showing nothing. but for x in data_content: print(x.text) – Titan Jul 11 '21 at 20:04
  • It is supposed to grab the `title` only. i.e. the very first paragraph. If it is left blank then so it will too. – Bilal Qandeel Jul 11 '21 at 20:06
  • oh i see but is it possible to grab all things in docx1, docx2 and puting in column1 and column2 of dataframe ? – Titan Jul 11 '21 at 20:08
  • doc.paragraphs[0].text shows nothing dude, but doc.paragraphs.text after iteration shows the content – Titan Jul 11 '21 at 20:10
  • if you need all the contents inside of the `docx` , then join the `paragraphs` all together with two new lines: (one to start a new line and another to start a new paragraph) that can be achieved using `join` like this `doc.paragraphs.join('\n\n')` – Bilal Qandeel Jul 11 '21 at 20:12
  • AttributeError: 'Paragraph' object has no attribute 'join'. ... – Titan Jul 11 '21 at 20:17
  • can you please update working code in your main code ? thank you for trying to help, really appreciate. – Titan Jul 11 '21 at 20:19
  • 1
    Working awesome !!!! changed little bit of code. MANY MANY THANKS ! – Titan Jul 11 '21 at 20:28
  • Hey, i tried this dataframe = pd.DataFrame(data_content , columns=[column_label]) and the dataframe only showing column2 but not column1 can you help ? – Titan Jul 11 '21 at 21:29
  • Of course, it did not. You have just squeezed all the `data_content` into a single column named by the value of `column_label`. `dataframe` is already of the data type `dataframe`. I see no value recasting it using `pd.DataFrame`. i.e. use `dataframe['some_nice_column']` as it is wherever needed. – Bilal Qandeel Jul 11 '21 at 21:42
  • yeah i understand but dataframe.to_excel is very easy one. can you please tell how to export that {column_label: data_content} as excel.xlsx – Titan Jul 11 '21 at 21:51
  • Do I get "Best Answer" XD XD? 1. `df = dataframe['some_nice_column']` and 2. `df.to_excel("output.xlsx")` – Bilal Qandeel Jul 11 '21 at 22:08
  • Hey buddy i understand but the columns should be 'n' number of columns because the docx file are not just two files but n no of files in my case. – Titan Jul 11 '21 at 22:16
  • hey i gave your response as best answer no doubt in that !!! – Titan Jul 12 '21 at 06:35
  • can you please help with this query ? i will give 50+ reputation if answered. https://stackoverflow.com/questions/68413792/how-to-sort-dataframe2-according-to-dataframe1-with-fuzzywuzzy – Titan Jul 17 '21 at 15:02
  • You got it... https://stackoverflow.com/questions/68413792/how-to-sort-dataframe2-according-to-dataframe1-with-fuzzywuzzy/68425166 – Bilal Qandeel Jul 18 '21 at 00:27