2

The MATLAB Engine is a C interface to MATLAB. It provides a function engEvalString() which takes some MATLAB code as a C string (char *), evaluates it, then returns MATLAB's output as a C string again.

I need to be able to pass unicode data to MATLAB through engEvalString() and to retrieve the output as unicode. How can I do this? I don't care about the particular encoding (UTF-8, UTF-16, etc.), any will do. I can adapt my program.


More details:

To give a concrete example, if I send the following sting, encoded as, say, UTF-8,

s='Paul Erdős'

I would like to get back the following output, encoded again as UTF-8:

s =

Paul Erdős

I hoped to achieve this by sending feature('DefaultCharacterSet', 'UTF-8') (reference) before doing anything else, and this worked fine when working with MATLAB R2012b on OS X. It also works fine with R2013a on Ubuntu Linux. It does not work on R2013a on OS X though. Instead of the character ő in the output of engEvalString(), I get character code 26, which is supposed to mean "I don't know how to represent this". However, if I retrieve the contents of the variable s by other means, I see that MATLAB does correctly store the character ő in the string. This means that it's only the output that didn't work, but MATLAB did interpret the UTF-8 input correctly. If I test this on Windows with R2013a, neither input, nor output works correctly. (Note that the Windows and the Mac/Linux implementations of the MATLAB Engine are different.)

The question is: how can I get unicode input/output working on all platforms (Win/Mac/Linux) with engEvalString()? I need this to work in R2013a, and preferably also in R2012b.


If people are willing to experiment, I can provide some test C code. I'm not posting that yet because it's a lot of work to distill a usable small example from my code that makes it possible to experiment with different encodings.


UPDATE:

I learned about feature('locale') which returns some locale-related data. On Linux, where everything works correctly, all encodings it returns are UTF-8. But not on OS X / Windows. Is there any way I could set the various encodings returned by feature('locale')?


UPDATE 2:

Here's a small test case: download. The zip file contains a MATLAB Engine C program, which reads a file, passes it to engEvalString(), then writes the output to another file. There's a sample file included with the following contents:

feature('DefaultCharacterSet', 'UTF-8')
feature('DefaultCharacterSet')
s='中'

The (last part of the) output I expect is

>> 
s =

中

This is what I get with R2012b on OS X. However, R2013 on OS X gives me character code 26 instead of the character . Outputs produces by R2012b and R2013a are included in the zip file.

How can I get the expected output with R2013a on all three platforms (Windows, OS X, Linux)?

Community
  • 1
  • 1
Szabolcs
  • 24,728
  • 9
  • 85
  • 174
  • [Cross posted to MATLAB Answers](http://www.mathworks.com/matlabcentral/answers/69316-get-matlab-engine-engevalstring-to-take-return-unicode) – Szabolcs Mar 31 '13 at 22:07

1 Answers1

1

I strongly urge you to use engPutVariable, engGetVariable, and Matlab's eval instead. What you're trying to do with engEvalString will not work with many unicode strings due to embedded NULL (\0) characters, among other problems. Due to how the Windows COM interface works, the Matlab engine can't really support unicode in interpreted strings. I can't speculate about how the engine works on other platforms.

Your other question had an answer about using mxCreateString_UTF16. Wasn't that sufficient?

Community
  • 1
  • 1
user244795
  • 698
  • 6
  • 14
  • So on Windows there's no chance this'll work with `engEvalString()`? Yes, the combination of `engPutVariable()`, `engGetVariable()` and `eval` should work (I can handle unicode with those). I simply didn't use it before because on OS X I didn't have any problems with sending/receiving UTF-8 to/from engEvalString---until R2013a came out. – Szabolcs Apr 02 '13 at 22:06
  • Something related: what about unicode .m source files? Do you know if MATLAB can handle those, or I shouldn't even try? – Szabolcs Apr 02 '13 at 22:08
  • 1
    Unfortunately it seems that using `evalc()` disables JIT compilation and causes a significant reduction in performance. It's not really a good solution because of this. – Szabolcs Apr 03 '13 at 16:02
  • Please provide a concrete example of what you're trying to do, and I'll try to figure out how you can do it. Otherwise would you mind marking this question as answered? – user244795 Apr 03 '13 at 16:46
  • Please see the update to the post. There's a concrete example demonstrating the problem. The output I expect is included as well. I can't accept this answer yet because while it solves the problem with transferring Unicode, it introduces another problem (performance), therefore I can't use it in practice. The reason I want to get Unicode working is that I am writing an interface between MATLAB and another unicode aware language. It allows executing MATLAB code from the other language. I can't prevent the users of this interface from sending Unicode, so I either have to get MATLAB... – Szabolcs Apr 03 '13 at 20:29
  • ... to handle it, or accept that MATLAB has a "unicode allergy" and try to convert everything to ASCII with a warning before sending the command to MATLAB. – Szabolcs Apr 03 '13 at 20:30
  • You're asking for several things that weren't in your original question. If you want to run a unicode .m file, add it to the matlabpath and tell matlab to run it, which will enable JIT. If you want to run dynamically generated strings then JIT isn't possible -- how would matlab cache the results? – user244795 Apr 08 '13 at 18:24