Find pattern in Html using regex

Question

I'm scrapping through an html page for data.

Is regex the proper techniqe for this kind of task? I'm searching for patterns where my data is supposed to be.

If regex is the right thing to do..I would appreciate help finding this pattern:

<span>3060</span>

the pattern is exactly four digits (0-9) inside span element

Thanks

What have you tried? Have you done any research into Regular expressions? Have you searched the web for this question? — Tim B James, May 09 '14 at 15:26
Why does this need to be done with a regular expression? I imagine a DOM parser is much more suited to the task. — David, May 09 '14 at 15:29
@EkoostikMartin, doesn't really apply here, he isn't trying to parse nor worry about nested elements. Perhaps he is searching html in an editor or using some search tool on a bunch of html files. Regex is pretty simple for this... something like `]*>[0-9]{4}` should work fine. — Smern, May 09 '14 at 15:31
@smerny - if he is searching an unknown and unlimited blob of HTML, "this" does apply here. Identifying and selecting something based on a pattern, is by definition, parsing, is it not? — EkoostikMartin, May 09 '14 at 15:35
@EkoostikMartin, it's more of a simple search than a dom parsing in this case as it doesn't concern nesting/hierarchy. If he is searching within an editor or using ransack or something to search a bunch of files, regex would make sense. — Smern, May 09 '14 at 15:36

score 1 · Accepted Answer · answered May 09 '14 at 15:40

try this:

preg_match_all("/(<span>\d{4}<\/span>)/", $myinput, $myoutput);

http://3v4l.org/72ClO

please note this does not parse html. it looks for something that starts with <span> then has 4 digits, then </span>. A single space in there, and will fail.

use this one to get the 4 digits only

preg_match_all("/<span>(\d{4})<\/span>/", $myinput, $myoutput);

http://3v4l.org/FF4Y9

Find pattern in Html using regex

1 Answers1