0

I need to search for specific word in html we page.

I try to do this using c# (asp.net core)

My point is to get url and word for search from View via js and than in response if word is exist show it , if not, show smth

I make method for getting html code of page. Here is code

 [HttpPost]
    public JsonResult SearchWord([FromBody] RequestModel model){


        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(model.adress);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            Stream receiveStream = response.GetResponseStream();
            StreamReader readStream = null;

            if (response.CharacterSet == null)
            {
                readStream = new StreamReader(receiveStream);
            }
            else
            {
                readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
            }

            string data = readStream.ReadToEnd();
            string strRegex = model.word;

            response.Close();
            readStream.Close();
            return Json(data);
    }

But, how I need to search for word correctly?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Eugene Sukh
  • 2,357
  • 4
  • 42
  • 86
  • Is it as simple as [getting the text from the Streamreader](https://stackoverflow.com/a/8606837/43846) and then using [String.Contains](https://msdn.microsoft.com/en-us/library/dy85x1sa(v=vs.110).aspx)? – stuartd Aug 14 '18 at 14:45

2 Answers2

0

You will not be able to do much with simple pattern matching, check out this SO classic - RegEx match open tags except XHTML self-contained tags. Consider using some web scraping library like html-agility-pack if you want to do some serious scraping. If you want to only search for the single word in a web-page, no matter whether it's a markup or CDATA etc., just join all the chars in an array and use string.Contains, or Regex.

Piotr Falkowski
  • 1,957
  • 2
  • 16
  • 24
0

To add to the previous answer you can use Regex.Match. Something like:

string pattern = @"(\w+)\s+(strRegex)";

// Instantiate the regular expression object.
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);

// Match the regular expression pattern against your html data.
Match m = r.Match(data);

if (m.Success) {
    //Add your logic here
}

NOTE: There are quite a few things you can do to optimize your code, specifically looking at how you are handling stream reader. I would read in chunks and try and match the chunk.

IsakBosman
  • 1,453
  • 12
  • 19