I have a script which assembles an xml document via string manipulation (which I wrote before I discovered the XML Suite).
When certain characters are included such as £, –(en-dash) and —(em dash) (I suspect all non-ascii characters), they're replaced with the unicode replacement character �
(U+FFFD)
.
This only happens when there is an xml header at the start of the document: i.e. <?xml
. Making any change at all to this fixes the problem and writes what I would expect to the file. My assumption is that applescript is trying to parse the string as xml, but I want it to pass as a string.
I'm writing in JXA, but have included the Applescript equivalent as I think the issue is with OSA and there are likely more applescript users!
edit: ok, this is more an encoding issue I guess—reading as UTF-8 (which the xml I'm generating should be) results in the replacement character, but Western or Mac Roman display the characters correctly. UTF-8 definitely supports these characters though, so I'm not sure the best way to move forward?
edit 2: Just to be clear: I think what's happening is that the non-ascii characters are being encoded in something other than UTF-8, which is causing my XML output to be invalid. How can I get applescript or JXA to encode non-ascii characters as UTF-8?
Applescript
set dt to path to desktop as text
set filePath to dt & "test1.txt"
writeTextToFile(text1, filePath, true)
-- using the example handler from the Mac Automation Scripting Guide
on writeTextToFile(theText, theFile, overwriteExistingContent)
try
-- Convert the file to a string
set theFile to theFile as string
-- Open the file for writing
set theOpenedFile to open for access file theFile with write permission
-- Clear the file if content should be overwritten
if overwriteExistingContent is true then set eof of theOpenedFile to 0
-- Write the new content to the file
write theText to theOpenedFile starting at eof
-- Close the file
close access theOpenedFile
-- Return a boolean indicating that writing was successful
return true
-- Handle a write error
on error
-- Close the file
try
close access file theFile
end try
-- Return a boolean indicating that writing failed
return false
end try
end writeTextToFile
Javascript for Automation
app.includeStandardAdditions = true
function writeTextToFile(text, file, overwriteExistingContent) {
try {
// Convert the file to a string
var fileString = file.toString()
// Open the file for writing
var openedFile = app.openForAccess(Path(fileString), { writePermission: true })
// Clear the file if content should be overwritten
if (overwriteExistingContent) {
app.setEof(openedFile, { to: 0 })
}
// Write the new content to the file
app.write(text, { to: openedFile, startingAt: app.getEof(openedFile) })
// Close the file
app.closeAccess(openedFile)
// Return a boolean indicating that writing was successful
return true
}
catch(error) {
try {
// Close the file
app.closeAccess(file)
}
catch(error) {
// Report the error is closing failed
console.log(`Couldn't close file: ${error}`)
}
// Return a boolean indicating that writing was successful
return false
}
}
var text = "<?xml £"
var file = Path("Users/benfrearson/Desktop/text.txt")
writeTextToFile (text, file, true)