1

I have a script which assembles an xml document via string manipulation (which I wrote before I discovered the XML Suite).

When certain characters are included such as £, –(en-dash) and —(em dash) (I suspect all non-ascii characters), they're replaced with the unicode replacement character (U+FFFD).

This only happens when there is an xml header at the start of the document: i.e. <?xml. Making any change at all to this fixes the problem and writes what I would expect to the file. My assumption is that applescript is trying to parse the string as xml, but I want it to pass as a string.

I'm writing in JXA, but have included the Applescript equivalent as I think the issue is with OSA and there are likely more applescript users!

edit: ok, this is more an encoding issue I guess—reading as UTF-8 (which the xml I'm generating should be) results in the replacement character, but Western or Mac Roman display the characters correctly. UTF-8 definitely supports these characters though, so I'm not sure the best way to move forward?

edit 2: Just to be clear: I think what's happening is that the non-ascii characters are being encoded in something other than UTF-8, which is causing my XML output to be invalid. How can I get applescript or JXA to encode non-ascii characters as UTF-8?

Applescript

set dt to path to desktop as text
set filePath to dt & "test1.txt"

writeTextToFile(text1, filePath, true)

-- using the example handler from the Mac Automation Scripting Guide
on writeTextToFile(theText, theFile, overwriteExistingContent)
    try

        -- Convert the file to a string
        set theFile to theFile as string

        -- Open the file for writing
        set theOpenedFile to open for access file theFile with write permission

        -- Clear the file if content should be overwritten
        if overwriteExistingContent is true then set eof of theOpenedFile to 0

        -- Write the new content to the file
        write theText to theOpenedFile starting at eof

        -- Close the file
        close access theOpenedFile

        -- Return a boolean indicating that writing was successful
        return true

        -- Handle a write error
    on error

        -- Close the file
        try
            close access file theFile
        end try

        -- Return a boolean indicating that writing failed
        return false
    end try
end writeTextToFile

Javascript for Automation

app.includeStandardAdditions = true

function writeTextToFile(text, file, overwriteExistingContent) {
    try {

        // Convert the file to a string
        var fileString = file.toString()

        // Open the file for writing
        var openedFile = app.openForAccess(Path(fileString), { writePermission: true })

        // Clear the file if content should be overwritten
        if (overwriteExistingContent) {
            app.setEof(openedFile, { to: 0 })
        }

        // Write the new content to the file
        app.write(text, { to: openedFile, startingAt: app.getEof(openedFile) })

        // Close the file
        app.closeAccess(openedFile)

        // Return a boolean indicating that writing was successful
        return true
    }
    catch(error) {

        try {
            // Close the file
            app.closeAccess(file)
        }
        catch(error) {
            // Report the error is closing failed
            console.log(`Couldn't close file: ${error}`)
        }

        // Return a boolean indicating that writing was successful
        return false
    }
}

var text = "<?xml £"
var file = Path("Users/benfrearson/Desktop/text.txt")


writeTextToFile (text, file, true)
  • Given your _AppleScript_ code, what value is suppose to be assigned to the `text1` variable to reproduce your issue? Is it suppose to be `set text1 to " – RobC May 19 '20 at 13:47
  • Oops, yes! looks like I missed the top line! When I explicitly open it (in Atom) and set encoding to UTF-8 it doesn't show the £ character. – Ben Frearson May 20 '20 at 08:11
  • Does this answer your question? [How can I write UTF-8 files using JavaScript for Mac Automation?](https://stackoverflow.com/questions/44268436/how-can-i-write-utf-8-files-using-javascript-for-mac-automation) – RobC May 20 '20 at 13:44

1 Answers1

-1

In AppleScript, you’d use write theText to theFile as «class utf8» to write UTF8-encoded text. You can’t do that in JXA as there’s no way to write raw AE codes.

I generally recommend against JXA as it’s 1. buggy and crippled, and 2. abandoned. If you like JavaScript in general you’re far better off with Node. For application automation you’re best sticking to AppleScript: while it’s a crappy language and also moribund, at least it speaks Apple events right and has half-decent documentation and community support.

If you must use JXA, the only workaround is to write your UTF8 file via the Cocoa APIs instead. Though generating XML via string-mashing is evil and bug-prone anyway, so you’d probably be as well taking the opportunity to rewrite your code to use a proper XML API. (Again, with Node you’re spoiled for choice and the hardest part will be figuring which NPM libraries are robust and easy to use and which are junk. With AS/JXA, it’s either System Events’ XML Suite, which is slow, or Cocoa’s XML APIs, which are complex.)

foo
  • 664
  • 1
  • 4
  • 4
  • Yep this answers the question! Works in Applescript, no chance in pure JXA. I found more info [here](https://stackoverflow.com/questions/29076947/jxa-set-utf-8-encoding-when-writing-files) I chose JXA because I needed to do extra string manipulation that applescript is so obtuse about. To be clear: what I'm doing is reading an XML template file, and then replacing a placeholder string with values from a text file. Still probably better to use an XML API in the future (but it's solid for now) – Ben Frearson May 20 '20 at 08:08