MATLAB and unicode still don't yet mix well, AFAIK. Your first example sadly returns a false positive. your second example works, but not without its caveats.
Take for example test.txt
(using UTF-8 encoding):
ΑΒΓΔ;ΕΖΗΘ
Then:
%// First try:
fid = fopen('test.txt', 'r', 'n', 'UTF-8');
A = textscan(fid, '%s', 'delimiter', ';');
fclose(fid);
A{1}{1}+0
%// apparently, textscan does not read the correct number of bits per
%// character...
%// Let's try manually:
fid = fopen('test.txt', 'r', 'n', 'UTF-8');
txt = fgetl(fid);
fclose(fid);
%// That makes things even worse! All characters get converted to the
%// placeholder character 26, causing false positives in string
%// comparisons:
D = textscan(txt, '%s', 'delimiter', ';');
D{1}{1}+0
strcmp(D{1}{2}, D{1}{2})
%// Regexp works better; it preserves the right character codes and yields
%// a (correct) negative on comparison:
C = regexp(txt, ';', 'split');
C{1}+0
strcmp(C{1}, C{2})
%// So, the "best" way:
fid = fopen('test.txt', 'r', 'n', 'UTF-8');
D = {};
while ~feof(fid)
line = fgetl(fid);
D{end+1} = regexp(line, ';', 'split'); %#ok<SAGROW>
end
fclose(fid);
The disp
still does not display them correctly, unless you have specifically selected a font for the command window/editor window which supports unicode.
If you're on Linux, unicode display works OK if you call MATLAB from bash or similar. But that has more to do with your shell than with MATLAB...
Also, looking at help unicode2native
:
%// Reading
fid = fopen('japanese.txt', 'r', 'n', 'Shift_JIS');
str = fread(fid, '*char')';
fclose(fid);
disp(str);
%// Writing
fid = fopen('japanese_out.txt', 'w', 'n', 'Shift_JIS');
fwrite(fid, str, 'char');
fclose(fid);
The disp
fails over here (R2010a), but the write is OK...