-1
string filename = "";
private void openToolStripMenuItem1_Click(object sender, EventArgs e)
{
    OpenFileDialog theDialog = new OpenFileDialog();
    theDialog.Title = "Open Text File";
    theDialog.Filter = "TXT files|*.txt";
    theDialog.InitialDirectory = @"C:\";
    if (theDialog.ShowDialog() == DialogResult.OK)
    {
        lines = File.ReadAllLines(RecentFiles);
        filename = theDialog.FileName;
        if (!lines.Any(line => line.Equals(filename)))
        {
            recentfiles = new StreamWriter(RecentFiles, true);
            recentfiles.WriteLine(theDialog.FileName);
            recentfiles.Close();
        }

        items = File
                .ReadLines(RecentFiles)
                .Select(line => new ToolStripMenuItem()
                {
                    Text = line
                })
                .ToArray();
        recentFilesToolStripMenuItem.DropDownItems.Clear();
        recentFilesToolStripMenuItem.DropDownItems.AddRange(items);

        TextFileContentToRichtextbox(filename);
    }
}

When i open a text file and add it to the richTextBox in cases i copied the website page source view content to a text file first and then open the text file how do i know if the content is html code or just regular text ?

Same when i make paste to the richTextBox window directly i want to know if the text is html code or regular text and then to decide how to continue.

Simon Gamlieli
  • 119
  • 1
  • 18
  • possible duplicate of [How can I determine if a file is binary or text in c#?](http://stackoverflow.com/questions/910873/how-can-i-determine-if-a-file-is-binary-or-text-in-c) – Da Maex Jun 05 '15 at 13:36
  • It sounds like you are only allowing users to add *.txt files (text files), but you want to know if it contains text that represents HTML, is that correct? Because HTML files have the *.html extension, so you immediately know that they're not HTML files. – ragerory Jun 05 '15 at 13:38
  • Question is, how do you differentiate between html and text theoretically? What would you consider HTML? Is it enough to have `` at the start of the file? Most browsers would display ordinary text just like a plain web page. On the other hand many (most?) valid HTML docs would not start with those characters. The problem is that HTML rules are followed pretty loosely. So you may want to take into account some fuzzy logic here. – dotNET Jun 05 '15 at 13:39
  • You wouldn't, the only thing you could would be to search for a tag that would be mandatory for html. – Greg Jun 05 '15 at 13:42
  • @Greg Which still wouldn't be enough. It's perfectly valid for a text file to contain HTML tags (obviously). You shouldn't interpret my text file talking about HTML as HTML, that's just plain wrong :) – Luaan Jun 05 '15 at 13:45
  • You should make a separate question for the second part (pasting text into the richtextbox) - it's a completely different thing (unlike files, clipboard content is in fact tagged with a type; it's not all that better than an extension, but at least it's part of the clipboard contract). – Luaan Jun 05 '15 at 13:47
  • Maybe i didn't explain my problem good enough. In my program i open a text file and then i have a button i'm trying to get all the http links from the text content using htmlagilitypack. But if i open a text file that have inside speical chars and is not a html content( i know it's not html) then i'm getting null exception on the htmlagilitypack. – Simon Gamlieli Jun 05 '15 at 13:48
  • @Luaan You don't understand, I specifically stated *mandatory* for an html file. For example, `` wouldn't be used aside in an html file. Unless your writing a documentation with a code example. – Greg Jun 05 '15 at 15:00
  • @Greg No, you didn't understand me. I was talking about the opposite scenario - trying to figure out whether a random text file is a HTML file. I've got plenty of text files on my system that include text like `` and `DOCTYPE`. Of course every valid HTML file has to contain ` – Luaan Jun 05 '15 at 16:18

1 Answers1

2

If you want to see if the file "looks like" HTML, you could check if some HTML specific text is present ("<body>", ...) in the text file.

If you want to ensure the HTML is valid using HTMLAgilityPack, you could do something like:

string html = File.ReadAllText(path);
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

if (htmlDoc.ParseErrors.Count() > 0)
{
   throw new InvalidOperationException("Not a valid HTML file");
}
Sébastien Sevrin
  • 5,267
  • 2
  • 22
  • 39