0

I asked this on https://answers.microsoft.com/en-us/msoffice/forum/all/how-to-read-utf8-data-output-from-curl-in/30f111f3-7f81-469e-824c-926fdbbed7d9?messageId=546fe3b8-09f7-4846-862e-0c7bf51d1e68 and it was suggested that I ask here.

I use cURL to access the Microsoft Translate API in MacOS VBA. I pass in JSON , I get back the result with accented characters just fine IF I output to a file by adding -o "file.txt". So cURL is working correctly.

But If I use a pipe using the usual popen / fread / pclose I get mojibake returned in the data. A ü becomes two characters (square-root, degree)

Is there a way of returning the utf-8 output through a pipe unmangled?

sysmod
  • 463
  • 3
  • 11
  • VBA probably enforces some Microsoft-chosen "good default" (i.e. hopelessly wrong); see if you can convince it to use UTF-8 with an explicit keyword argument or configuration call. – tripleee May 27 '22 at 11:58
  • Yes: do not use UTF-8 but instead UTF-16 for pipes, just like using the WinAPI's *W() functions to use UTF-16 instead of ASCII. Or stop outsourcing HTTP downloads to cURL when [VBA could do it directly](https://stackoverflow.com/q/17877389/4299358). – AmigoJack May 31 '22 at 07:06
  • @AmigoJack Mac VBA does not support Windows libraries like Microsoft.XMLHTTP. Sorry, I should have said that explicitly at first. I've edited my question ... and posted the answer. – sysmod May 31 '22 at 08:41

1 Answers1

0

The solution: run CURL with MacScript with "do shell script ...command...>". That correctly returns the output of the command as a UTF-8 string.

The original method of opening a pipe to the command used the Mac C libraries in /usr/lib/libc.dylib: popen(command,"r") and fread chunks until feof. That works but returns the raw bytes so accented characters become mojibake; I needed UTF-8.

By the way, another method is to send the command output to a file and then read the file contents using one line of Applescript:

read "outfile.txt" «class utf8»

which in VBA code becomes

MacScript("read """ & outFile & """ as " & Chr$(171) & "class utf8" & Chr(187))

171 is the left guillemet (double chevron), 187 the right. I don't know why Apple chose such weird quotes for Applescript internal literals rather than normal quotes like everyone else. It's not safe to use accented or other extended character literals in VBA, they become mangled if the module is edited in another OS (Win to MacOS or vice versa). eg «class utf8» becomes Çclass utf8È

sysmod
  • 463
  • 3
  • 11