4

I've been struggling with this for a while. I created a utility that allows you to open .TXT files. These text files contain PCL (Print Command Language). When I import a new file, it is being truncated by a \0 (NULL Terminator character). Because PCL files contain graphic images randomly throughout everything I import is truncated at the first bitmap image because the bitmap images start with NULL.

This is the exact issue as seen here: Displaying Raw Data From Image File Using TextBox or RichTextBox?

Unfortunately I can't comment on this thread though because of my low (newbie) reputation (need 15 rep). Also can't paste a screenshot (need 10 rep).


Here is how Notepad++ displays the information:

enter image description here


Here is how my RichTextBox displays this same information:

enter image description here


Here is why this is a problem (Zoomed out):

enter image description here

The raster data is right between two sections of data I need (The PCL). All of the information below the raster data won't pull in.


Here is what I have tried (Note: I am using a custom RichTextBox, but this shouldn't affect anything since it's just a RichTextBox with Drag/Drop functionality):

byte[] bytes = new byte[2048];
string data = System.Text.Encoding.ASCII.GetString(bytes);
dragDropRichTextBox1.Text = data.Replace("\0", @"1");

This just causes a chain of 2048 number "1" characters with none of the text file's data pulling in. Any help is much appreciated.

Whatever I do, I would like to preserve my current drag/drop functionality:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace PCL_Utility
{
    public class DragDropRichTextBox : RichTextBox
    {
        public DragDropRichTextBox()
        {
            this.AllowDrop = true;
            this.DragDrop += DragDropRichTextBox_DragDrop;
        }

        void DragDropRichTextBox_DragDrop(object sender, DragEventArgs e)
        {
            //string[] fileText = e.Data.GetData(DataFormats.FileDrop) as string[];
            string[] fileText = e.Data.GetData(DataFormats.FileDrop) as string[];

            if (fileText != null)
            {
                foreach (string name in fileText)
                {
                    try
                    {
                        this.AppendText(File.ReadAllText(name) + "\n -------- End of File -------- \n\n");
                    }
                    catch (Exception ex)
                    {
                        MessageBox.Show(ex.Message);  
                    }
                }
            }
        }
    }
}
Community
  • 1
  • 1
Gernatch
  • 175
  • 3
  • 19
  • I do not know an answer for certain, but is there a way you can read the file's size then read that many bytes? – Mardoxx Jun 17 '14 at 16:17
  • I understand what you are saying. For whatever reason though, even if I have enough bytes for each character it would make "every" character a 1, rather than just the "\0" characters. All I know is that if I get bumped up 4 more reputation points I'll be dropping in some screenshots of this beast! – Gernatch Jun 17 '14 at 16:20
  • Please tag your question with the language you're programming in. – Barmar Jun 17 '14 at 16:24
  • Tag has been updated. – Gernatch Jun 17 '14 at 16:43
  • 1
    RichTextBox was designed to display text, not binary data. Rich Text in particular. Short from the gibberish it displays, it will treat binary zeros in the PCL data as a string terminator. You must convert the data into a format suitable for RTB and humans. Hexadecimal is very common. Simple to do with BitConverter.ToString(byte[]). – Hans Passant Jun 17 '14 at 17:20

2 Answers2

4

First of all, you don't want ASCII encoding. ASCII is a 7-bit encoding. Any characters read that have the high bit set (i.e. character codes 128 through 255) are converted to question marks by the decoder. So reading binary data as ASCII is going to destroy your data.

Second, the rich text box uses a Windows control under the hood, and that control is designed to work with null-terminated strings. So it's going to truncate the text the first time it sees a '\0' character. If you want to display binary data in an edit control, you need to modify the text to be displayed.

Your "text" files really aren't text, as they contain binary (i.e. non-human-readable) data. Your best bet is to open the file and read the entire thing into a memory buffer as binary. That is:

byte[] fileBytes = File.ReadAllBytes("filename");

Then, if you want to display the data in a text control, you have to create a string that represents the data. I would suggest something like:

StringBuilder sb = new StringBuilder();
foreach (var b in fileBytes)
{
    // handle printable characters
    if (b >= 32 || b == 10 || b == 13 || b = 9) // lf, cr, tab
        sb.Append((char)b);
    else
    {
        // handle control characters
        switch (b)
        {
            case 0 : sb.Append("(nul)"); break;
            case 27 : sb.Append("(esc)"); break;
            // etc.
        }
    }
}

Rather than building a big switch statement, you might want to build a lookup table that has the strings for each of the values you want to convert. A dictionary would probably be best. Something like:

private Dictionary<byte, string> Conversions = new Dictionary<byte, string>()
{
    {0, "(nul)"},
    {27, "(esc)"},
    // etc.
};

Then your loop could do this:

foreach (var b in fileBytes)
{
    string s;
    if (Conversions.TryGetValue(b, out s))
    {
        sb.Append(s);
    }
    else
    {
        sb.Append((char)b);
    }
}
Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • Thanks for this Jim. This is very helpful information. I will work on this as soon as I can and let you know what I find. – Gernatch Jun 17 '14 at 18:51
1

Rather than trying to read the file data into a string, as Jim Mischel answered it should be read into a byte array and processed.

Here's a static class that will read a file into a byte array, and process it based on a dictionary lookup. I've prepopulated the dictionary with "\00" for non-printable ASCII characters and all values over 127.

    public static class BinaryFile
    {

        private static string[] __byteLookup = new string[256];

        static BinaryFile()
        {
            // Display printable ASCII characters as-is
            for (int i = 0x20; i < 0x7F; i++) { __byteLookup[i] = ((char)i).ToString(); }

            // Display non-printable ASCII characters as \{byte value}
            for (int i = 0; i < 0x20; i++) { __byteLookup[i] = "\\" + i.ToString();}
            for (int i = 0x7F; i <= 0xFF; i++) { __byteLookup[i] = "\\" + i.ToString(); }

            // Replace pre-populated values with custom values here if desired.
        }

        public static string ReadString(string filename)
        {
            byte[] fileBytes = System.IO.File.ReadAllBytes(filename);

            return String.Join("", (from i in fileBytes select __byteLookup[i]).ToArray());
        }
    }

Edit since you want to use this with your custom drag-and-drop code, the usage should be:

   void DragDropRichTextBox_DragDrop(object sender, DragEventArgs e)
    {
        string[] fileText = e.Data.GetData(DataFormats.FileDrop) as string[];

        if (fileText != null)
        {
            foreach (string name in fileText)
            {
                try
                {
                    // Read each file using the helper class rather than File.ReadAllText
                    // then append the end-of-file line
                    this.AppendText(BinaryFile.ReadString("your_file_name.txt") 
                        + "\n -------- End of File -------- \n\n");
                }
                catch (Exception ex)
                {
                    MessageBox.Show(ex.Message);  
                }
            }
        }
    }
pmcoltrane
  • 3,052
  • 1
  • 24
  • 30
  • It's not going to work if the file contains characters with codes above 127. ASCII is a 7-bit encoding. It will convert any characters above 127 to question marks. – Jim Mischel Jun 17 '14 at 20:29
  • You're right. I've modified my answer so it doesn't mislead anyone. – pmcoltrane Jun 17 '14 at 21:39
  • It looks like to do this I would have to get rid of my current drag/drop functionality. I've been trying to reconcile what you've given me here with what I have but I can't make the two work together. I've updated my original post to show the drag/drop rich text box I've created if you'd like to see it. I'm not the most advanced programmer. This is the farthest I've been from home if you will. Any help is appreciated, but not required of course. Thanks for your help so far. – Gernatch Jun 18 '14 at 16:49
  • @user3290333 it looks like you're trying to display all of the files dragged to the RichTextBox, separated by an end-of-file marker. This should be doable: replace your File.ReadAllText with the helper function (see edit). I haven't tested, but it should work. – pmcoltrane Jun 18 '14 at 16:59
  • An array is more efficient than a dictionary: `private static readonly string[] __byteLookup = new string[256];`. Plus you can index it with an int and avoid byte casting. – 0xF Jun 18 '14 at 17:01
  • This does it! Now I just need to convert the "\27" to read "(esc)" and I'm done with this issue. I think I should be able to do it with dragDropTextBox1.Text.Replace ("/27", "(esc)"); Thanks for everyone's help on this. Made my day. – Gernatch Jun 18 '14 at 18:19
  • @user3290333 Look in the BinaryFile class at the line that reads `// Replace pre-populated values with custom values here if desired.`. Add a line below that comment that reads `__byteLookup[27] = "(esc)";`. It'll save you having to run Replace in your own code. – pmcoltrane Jun 18 '14 at 18:42
  • Mind = Blown. What's your phone number? Adding you to speed dial. Just kidding. Thanks again. I'm a very new programmer, this is my second program. It's a lot for me. Everything else has been smooth though. – Gernatch Jun 18 '14 at 19:00
  • @pmcoltrane One last question, I think I promise. Is there a way to view 0x7F through 0xFF in binary (ex: 11010000 - 11111111 instead of 128 - 255)? I'm learning so much about conversion through all this. This would be an awesome thing to play with. – Gernatch Jun 18 '14 at 20:36
  • @user3290333 see http://stackoverflow.com/questions/3702216/how-to-convert-integer-to-binary-string-in-c for converting an integer to a binary string. You can modify the 0x7F to 0xFF loop in BinaryFile to store whatever value you wish to be displayed. – pmcoltrane Jun 18 '14 at 20:41
  • Sorry for my lack of ability to pick things up. I'm really trying. Does this look right? for (int i = 11010000; i <= 11111111; i++) { __byteLookup[i] = "\\" + i.ToString(); } Plus, I would need to have the convert method preceding this too? – Gernatch Jun 18 '14 at 21:12
  • No, don't change the for-loop itself. `for (int i = 0x20; i < 0x7F; i++) { __byteLookup[i] = ConvertToBinaryString(i);` Then write a (static) function that converts an integer to a binary string, using the page I previously linked. – pmcoltrane Jun 18 '14 at 21:14
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/55877/discussion-between-user3290333-and-pmcoltrane). – Gernatch Jun 18 '14 at 23:09