2

This is a problem in analyzing some text files using Matlab, which is screwing up some of the text. I am using R2017a (9.2.0.538062) 64-bit (maci64). Please note the accented characters.

Other text editors are reading the file ("War and Peace.txt") correctly (Textmate, Emacs, Textedit, and GNU Octave), as well as other programs (Python, Ruby, Mathematica).

It was in July, 1805, and the speaker was the well-known Anna Pávlovna Schérer, maid of honor and favorite of the Empress Márya Fëdorovna.

Whereas in Matlab

It was in July, 1805, and the speaker was the well-known Anna Pávlovna Schérer, maid of honor and favorite of the Empress Márya Fëdorovna.

My Question

Is there a Matlab (preferences?) setting that will read Ascii text accurately? Matlab appears to be garbling valid Ascii characters (mostly in the 200-256 range).

user3161399
  • 253
  • 2
  • 6
  • 2
    Maybe [MATLAB: how to display UTF-8-encoded text read from file?](http://stackoverflow.com/q/6863147/5358968) – Steve Apr 11 '17 at 23:57
  • I don't think those are fully ASCII characters. It looks like you are just getting an uppercase character (A), followed by a garbage character. Maybe there is only an uppercase version of characters with accents and you are just getting the first one. – Evan Carslake Apr 12 '17 at 00:02

1 Answers1

0

I actually faced the same problem as yours, when trying to read string from a text file. The problem with me was that I saved the .txt file in ANSI Encoding Format. After many trials, I came up with a solution. First you have to save the file in UTF-8 Encoding format. Like this:

test

Then in your MATLAB code, you should specify the encondigIn in fopencommand.

A test code can be something like:

close all;clearvars;clc;

fileID = fopen('text.txt', 'r', 'n', 'UTF-8');
C = textscan(fileID, '%s');
fclose(fileID);

celldisp(C) 

The output of this code would be:

C{1}{1} =

It


C{1}{2} =

was


C{1}{3} =

in


C{1}{4} =

July,


C{1}{5} =

1805,


C{1}{6} =

and


C{1}{7} =

the


C{1}{8} =

speaker


C{1}{9} =

was


C{1}{10} =

the


C{1}{11} =

well-known


C{1}{12} =

Anna


C{1}{13} =

Pávlovna


C{1}{14} =

Schérer,


C{1}{15} =

maid


C{1}{16} =

of


C{1}{17} =

honor


C{1}{18} =

and


C{1}{19} =

favorite


C{1}{20} =

of


C{1}{21} =

the


C{1}{22} =

Empress


C{1}{23} =

Márya


C{1}{24} =

Fëdorovna.
Tes3awy
  • 2,166
  • 5
  • 29
  • 51