web crawling using regex or xml libraries

Asked Nov 28 '17 at 07:17

Active Nov 28 '17 at 07:17

Viewed 45 times

I am trying to do webcrawler program using python. In this case, can we use regex to get the expected string or can we use the XML packages in python to get the string ?

I can see most of them are using regex. I like to know reason behind it.

asked Nov 28 '17 at 07:17

Simbu

I use RegEx for crawling (in general, not python. I use PHP) because its easier, at least for me, then using a 3th-party library. – Manuel Mannhardt Nov 28 '17 at 07:25
Possible duplicate of [Using regular expressions to parse HTML: why not?](https://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) Even better: https://stackoverflow.com/a/1732454/1640661 – Anthony Geoghegan Nov 28 '17 at 10:36

web crawling using regex or xml libraries

0 Answers0