3

is it possible to compare strings completely written in Greek characters ? for example:

str1='ΑΒΓΔ'
str2='ΕΖΗΘ'
strcmp(str1,str2)

is it possible to apply the above when i read strings in Greek characters from a file? for example:

line='ΑΒΓΔ;ΕΖΗΘ'
[str1 str2] = strread(line,'%s %s','delimiter',';')
strcmp(str1,str2)
Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96
  • Have a look [at this file from the FEX](http://www.mathworks.com/matlabcentral/fileexchange/18956-read-unicode-files). – Rody Oldenhuis Apr 01 '14 at 12:06
  • In addition to what @RodyOldenhuis said you may also want to read [this answer](http://stackoverflow.com/a/21097643/983722) by Amro. As well as the answer to [this question](http://stackoverflow.com/q/6863147/983722). – Dennis Jaheruddin Apr 01 '14 at 13:50

2 Answers2

2

MATLAB and unicode still don't yet mix well, AFAIK. Your first example sadly returns a false positive. your second example works, but not without its caveats.

Take for example test.txt (using UTF-8 encoding):

ΑΒΓΔ;ΕΖΗΘ

Then:

%// First try: 
fid = fopen('test.txt', 'r', 'n', 'UTF-8');
A = textscan(fid, '%s', 'delimiter', ';');
fclose(fid);
A{1}{1}+0
%// apparently, textscan does not read the correct number of bits per
%// character...


%// Let's try manually:
fid = fopen('test.txt', 'r', 'n', 'UTF-8');
txt = fgetl(fid);
fclose(fid);

%// That makes things even worse! All characters get converted to the
%// placeholder character 26, causing false positives in string
%// comparisons:
D = textscan(txt, '%s', 'delimiter', ';');
D{1}{1}+0
strcmp(D{1}{2}, D{1}{2})


%// Regexp works better; it preserves the right character codes and yields
%// a (correct) negative on comparison:
C = regexp(txt, ';', 'split');
C{1}+0
strcmp(C{1}, C{2})


%// So, the "best" way: 
fid = fopen('test.txt', 'r', 'n', 'UTF-8');
D   = {};
while ~feof(fid)
    line     = fgetl(fid);
    D{end+1} = regexp(line, ';', 'split'); %#ok<SAGROW>
end
fclose(fid);

The disp still does not display them correctly, unless you have specifically selected a font for the command window/editor window which supports unicode.

If you're on Linux, unicode display works OK if you call MATLAB from bash or similar. But that has more to do with your shell than with MATLAB...

Also, looking at help unicode2native:

%// Reading
fid = fopen('japanese.txt', 'r', 'n', 'Shift_JIS');
str = fread(fid, '*char')';
fclose(fid);

disp(str);

%// Writing
fid = fopen('japanese_out.txt', 'w', 'n', 'Shift_JIS');
fwrite(fid, str, 'char');
fclose(fid);

The disp fails over here (R2010a), but the write is OK...

Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96
0

String encoding is not one of MATLAB's strong points. For working with Unicode, there are unicode2native and native2unicode. However, string literals that you enter in your code are ASCII-only, as far as I can tell. This means that you need to read your non-ASCII strings from a file that uses a suitable encoding using fread and then use native2unicode to convert the raw bytes to Unicode.

Florian Brucker
  • 9,621
  • 3
  • 48
  • 81
  • it is impossible to use fread, the data file has not a standard format. I use strread and then i separate data with delimiter ";" – user3270686 Apr 01 '14 at 11:52
  • @user3270686: You can first read your data using `fread`, then do the conversion and then do further processing. However all the further processing would need to support Unicode. – Florian Brucker Apr 01 '14 at 12:00