Regex to match previous names

Question

I have been having some trouble writing regex to match previous names on this page: http://steamcommunity.com/id/TripleThreat/namehistory

To be clear, I want in an array the following:

TripleThreat
[FD] TripleThreat.blyat
9

and so on..

I have already tried writing the Regex but it was a disaster (Something I struggle with)

Here's what I wrote:

$page = file_get_contents(sprintf("http://steamcommunity.com/id/TripleThreat/namehistory"));

preg_match_all("/<span class=\"historyDash\">-<\/span>((.|\n)*)<\/div>/", $page, $matches);

foreach($matches[0] as $match) {
    echo($match . "<br/>");
}

Any help is much appreciated :)

I did look for an API to retrieve the data, but found nothing, scraping is the only option it seems. — SM9, Jul 27 '16 at 23:48

score 1 · Accepted Answer · edited May 23 '17 at 12:22

1

You can try the following regex (the match is in the first capturing group):

"/<span class=\"historyDash\">-<\/span>\s*((?:[^\<]|\n)*?)\s*<\/div>/"

See it on Regex101.

The changes I made: trimmed whitespace before and after with the \s*, changed the . to [^\<] to choose only the ones that aren't tag (i.e., the correct text).

Note: As @PedroLobito pointed out, don't parse HTML with regex unless necessary. Use a library to parse the DOM instead when you can. I just provided an easy example to extend your work, but it might not be the best solution.

edited May 23 '17 at 12:22

Community

1
1

answered Jul 28 '16 at 00:01

Jonathan Lam

16,831
17
68
94

1

never use regex to parse html, bad example. – Pedro Lobito Jul 28 '16 at 00:21
1

@PedroLobito Good point, I'll mention that in my answer. Edited now. – Jonathan Lam Jul 28 '16 at 00:23

Regex to match previous names

1 Answers1