-1

I have a word file (.docx) containing comma separated data as shown in format below:

Id,Firstname,Lastname,Salary,Department  
1,ABC,XYZ,10000,ENG  
2,DEF,XYZ,20000,FIN  

I want to read this comma separated data directly as a dataframe in pandas. Please help.
I have already found a way to convert this data into an excel .csv file and then use pd.read_csv function in pandas. But, wanted to know if direct data import from .docx would be possible or not?

TIA!

mozway
  • 194,879
  • 13
  • 39
  • 75

1 Answers1

1

You can use a combination of docx, io.StringIO, and pandas.read_csv:

import docx
import io
import pandas as pd

content = docx.Document('data.docx').paragraphs[0].text
# or if all paragraphs
# content = '\n'.join([p.text for p in docx.Document('data.docx').paragraphs

df = pd.read_csv(io.StringIO(content))

output:

   Id Firstname Lastname  Salary Department
0   1       ABC      XYZ   10000        ENG
1   2       DEF      XYZ   20000        FIN
mozway
  • 194,879
  • 13
  • 39
  • 75