1

How can i parse text from docx file?

I already tried Data(contentsOf:) and String(contentsOf:) but nothing worked.

renglrio
  • 63
  • 5
  • 1
    What is your file exactly? Rtf? A .doc might not be readable in utf8. Not all file are convertible to UTF8 string. For instance, and Image isn't. – Larme Feb 11 '20 at 10:37
  • the file that i'm trying to read is in .pages format. i also want to be able to read .doc/.docx format. i have tried using different encoding but no luck. – renglrio Feb 11 '20 at 16:10
  • You cannot do it like that. It’s not a valid utf8 file. And it’s proprietary format too. – Larme Feb 11 '20 at 16:26
  • so how can i parse parse text from rtf? – renglrio Feb 12 '20 at 00:52
  • For rtf, there is a NsAttributedString.DocilentType specific – Larme Feb 12 '20 at 07:57

1 Answers1

2

This can't be done using Data(contentsOf:) or String(contentsOf:) because .docx format is a zipped format consists of xml and other files. In order to parse the text from the .docx file, you should unzip the doc file. In my case, I used ZIPFoundation to unzip the document. Parse the file named word/document.xml under the extract path using any XML Parser and you will be able to get the text from the document.

Sources:

Converting Docx Files To Text In Swift

Reading or Converting word .doc files iOS

renglrio
  • 63
  • 5