0

I have a Rtf file and I need read file to parser. In the file has some special characters, because has images in the file. When I read all text from file, the content after special characters can't be read.

I tried read file with ReadAllText with Encoding.UTF8 and Encoding.ASCII

public class ReadFile
{
    public static string GetFileContent(string path)
    {
        if (!File.Exists(path))
        {
            throw new FileNotFoundException();
        }
        else
        {
            // I also tried 
            // return File.ReadAllText(path, Encoding.ASCII);
            string text = string.Empty;
            var fileStream = new FileStream(path, FileMode.Open, FileAccess.Read);
            using (var streamReader = new StreamReader(fileStream, Encoding.UTF8))
            {
                string line;
                while ((line = streamReader.ReadLine()) != null)
                {
                    text += line;
                }
            }
            return text;
        }
    }
}

Actually my result is all text until start special character.

{\rtf1\ansi\ansicpg1252\deff0\deftab720{\fonttbl{\f0\fnil Times New Roman;}{\f1\fnil Arial;}}{\colortbl;\red000\green000\blue000;\red255\green000\blue000;\red128\green128\blue128;}\paperw11905\paperh16837\margl360\margr360\margt360\margb360 \sectd \sectdefaultcl \marglsxn360\margrsxn360\margtsxn360\margbsxn360{ {*\do\dobxpage\dobypage\dodhgt8192\dptxbx{\dptxbxtext\pard\plain {\pict\wmetafile8\picw19499\pich1746\picwgoal1305695\pichgoal116957 \bin342908

Rtf File is here

René Vogt
  • 43,056
  • 14
  • 77
  • 99
Willimar
  • 61
  • 9
  • What special characters do you mean? Do you mean the rtf markup? If I read your question correctly, your problem is not to read the text from the file, but to interprete or get rid of the rtf markups? – René Vogt Aug 26 '19 at 13:23
  • I think binary from the image. I haven't problem with parser in markup. I can't read all file content. All code was posted. – Willimar Aug 26 '19 at 13:27
  • 2
    You are using a StreamReader with a ReadLine. Of course, when it meet a NULL character or something binary the ReadLine gots confused and doesn't work well. Probably you should use a method capable to understand RTF. What kind of application are you using? – Steve Aug 26 '19 at 13:32
  • You indicate some method to read? Can you help me with this? I am look to SautinSoft. – Willimar Aug 26 '19 at 13:46
  • I changed `while` to `while (! streamReader.EndOfStream)` and I read all lines, but in the `string` variable has only party of the text. – Willimar Aug 26 '19 at 14:33

1 Answers1

0

I made. To read file I used File.ReadAllBytes(path) and in resulted variable I replace byte 0 by (nul) and byte 27 by esc.

byte[] fileBytes = File.ReadAllBytes(path);

StringBuilder sb = new StringBuilder();
foreach (var b in fileBytes)
{
    // handle printable characters
    if ((b >= 32) || (b == 10) || (b == 13) || (b == 9)) // lf, cr, tab
        sb.Append((char)b);
    else
    {
        // handle control characters
        switch (b)
        {
            case 0: sb.Append("(nul)"); break;
            case 27: sb.Append("(esc)"); break;
                // etc.
        }
    }
}

return sb.ToString();

I found the help in

Willimar
  • 61
  • 9