2

I have a .docx file in my temporary storage:

    let location: NSURL = NSURL.fileURLWithPath(NSTemporaryDirectory())
    let file_Name = location.URLByAppendingPathComponent("5 November 2016.docx")

What I now want to do is extract the text inside this document. But I cannot seem to find any converters or methods of doing this.

I have tried this:

    let file_Content = try? NSString(contentsOfFile: String(file_Name), encoding: NSUTF8StringEncoding)
    print(file_Content)

However it prints nil.

So how do I read the text in a docx file?

  • What is `URL` in your call to `contentsOfFile:`? Shouldn't that be `File_Name`? BTW - it is standard practice to name methods and variables to start with lowercase letters. Class names start with uppercase letters. – rmaddy Nov 06 '16 at 02:56
  • @rmaddy Yes, it should be File_Name, just a copying and pasting mistake. –  Nov 06 '16 at 03:22
  • You should fix your question to avoid confusion. – rmaddy Nov 06 '16 at 03:27
  • @rmaddy But it still does return nil. –  Nov 06 '16 at 03:27
  • @rmaddy There does seem to be many answers that allow me to read the contents of the file, but they are in other languages such as python. http://stackoverflow.com/questions/116139/how-can-i-search-a-word-in-a-word-2007-docx-file?rq=1 –  Nov 06 '16 at 04:20

2 Answers2

5

Swift 4, Xcode 9.1, OSX targets from 10.10 to 10.13

I have found that the following code extracts text handily from a Word .doc file, which then easily goes into a string. (The attributed string contains formatting information that might be parsed to good effect.) The main info that I wanted to convey was the bit about using .docFormat to specify the document type.

    let openPanel   = NSOpenPanel()
    var fileString  = String("")
    var fileData    = NSData()
    let fileURL     = openPanel.url

    do {
        fileData =  try NSData(contentsOf: fileURL!)
        if let tryForString = try? NSAttributedString(data: fileData as Data, options: [
            .documentType: NSAttributedString.DocumentType.docFormat,
            .characterEncoding: String.Encoding.utf8.rawValue
            ], documentAttributes: nil) {
            fileString = tryForString.string
        } else {
            fileString = "Data conversion error."
        }
        fileString = fileString.trimmingCharacters(in: .whitespacesAndNewlines)
    } catch {
        print("Word Document File Not Found")
    }
clyrinkpress
  • 51
  • 1
  • 2
3

Your initial issue is with how you get the string from the URL. String(File_Name) is not the correct way to convert a file URL into a file path. The proper way is to use the path function.

let location = NSURL.fileURLWithPath(NSTemporaryDirectory())
let fileURL = location.URLByAppendingPathComponent("My File.docx")
let fileContent = try? NSString(contentsOfFile: fileURL.path, encoding: NSUTF8StringEncoding)

Note the many changes. Use proper naming conventions. Name variables more clearly.

Now here's the thing. This still won't work because a docx file is a zipped up collection of XML and other files. You can't load a docx file into an NSString. You would need to use NSData to load the zip contents. Then you would need to unzip it. Then you would need to go through all of the files and find the desired text. It's far from trivial and it is far beyond the scope of a single stack overflow post.

rmaddy
  • 314,917
  • 42
  • 532
  • 579
  • Possibly. But asking for such recommendations is off-topic for stack overflow. You'll need to use Google. – rmaddy Nov 06 '16 at 04:20
  • No. You'll need to find an existing library to use or find code in another language and translate it. – rmaddy Nov 06 '16 at 04:23