0

i want to extract some data from a website. f.e. (https://www.chefkoch.de/rezepte/drucken/512261146932016/Annas-Rouladen-mit-Seidenkloessen.html). The text on the left side an the ingredients table on the right.

i tried several ways like with a webclient and regex the parts but the problem was here that if the table has more than one list like in my example i cant split them.

i also tried it with an htmldocument and get the elements but the elements doesnt have an id; only a class.

so is there any way to get these two thing out of the website? im pretty new too html and that kind of stuff..

Michael
  • 39
  • 6
  • That's a very simple layout. The class name is an HtmlElement's `Attribute`. The Attribute name is `className` (note the camel-case). You can use the standard WebBrowser control to navigate to that page. Use the `DocumentCompleted` event as shown here: [How to get an HtmlElement value inside Frames/IFrames?](https://stackoverflow.com/a/53218064/7444103). See the notes about IFrames, too. – Jimi May 30 '20 at 12:37
  • Hey im an absolute newbie to this. Could you specify your solution a little bit? – Michael May 30 '20 at 17:34
  • It's all explained in the answer I linked (`DocumentCompleted` and `Attributes` usage included). – Jimi May 30 '20 at 18:25

1 Answers1

0

You should consider using some sort of web scraping library like https://ironsoftware.com/csharp/webscraper/ or Selenium. Doing so, you'll be able to target HTML elements and css classes (to extract the data).

Greg
  • 4,468
  • 3
  • 16
  • 26