Removing black diamond with question mark using c#

Question

My problem:

I converted an HTML to plain text using this method... it takes in a .html file(this html file is .msg of outlook converted to .html) and then I removed all the tags using regex expressions.

    public string ReadEmailTemplate(string EmailTemplateFilePath)
    {

        return File.ReadAllText(EmailTemplateFilePath);

    }

but I am seeing a black diamond with white question mark inside it after removing all the html tags. I know that this happens when it is an unknown character. What I needed to do is that I need to remove those from the string. Is it possible using c# codes? I've tried this method to remove them but it did not remove those black question mark diamond..

public string replaceBlackQuestionMark(string output)
    {
        while(output.Contains('�'))
        {
            output = output.Replace("�", "");
        }
        return output;
    }

This is the output of the string in a messageBox. It contains black diamond with white question marks.

_I converted a HTML to plain text_ can you show us how you did that? Your problem it's likely an encoding issue. — StepTNT, May 06 '20 at 07:56
Is it probably unknown character indicated by other encoding - check what encoding you use in source, and in code. — Leszek Mazur, May 06 '20 at 07:56
@StepTNT edited the question , added how I got to the final output. — keinz, May 06 '20 at 08:01
The unknown characters are not within the character set you're dealing with and much more than the characters you would like to keep. In this case whitelisting is probably a better approach than blacklisting by taking ASCII only, for example. — Zephyr, May 06 '20 at 08:05

Removing black diamond with question mark using c#

0 Answers0