0

I have been having some trouble writing regex to match previous names on this page: http://steamcommunity.com/id/TripleThreat/namehistory

To be clear, I want in an array the following:

  • TripleThreat
  • [FD] TripleThreat.blyat
  • 9

and so on..

I have already tried writing the Regex but it was a disaster (Something I struggle with)

Here's what I wrote:

$page = file_get_contents(sprintf("http://steamcommunity.com/id/TripleThreat/namehistory"));

preg_match_all("/<span class=\"historyDash\">-<\/span>((.|\n)*)<\/div>/", $page, $matches);

foreach($matches[0] as $match) {
    echo($match . "<br/>");
}

Any help is much appreciated :)

SM9
  • 111
  • 5
  • I did look for an API to retrieve the data, but found nothing, scraping is the only option it seems. – SM9 Jul 27 '16 at 23:48

1 Answers1

1

You can try the following regex (the match is in the first capturing group):

"/<span class=\"historyDash\">-<\/span>\s*((?:[^\<]|\n)*?)\s*<\/div>/"

See it on Regex101.

The changes I made: trimmed whitespace before and after with the \s*, changed the . to [^\<] to choose only the ones that aren't tag (i.e., the correct text).


Note: As @PedroLobito pointed out, don't parse HTML with regex unless necessary. Use a library to parse the DOM instead when you can. I just provided an easy example to extend your work, but it might not be the best solution.

Community
  • 1
  • 1
Jonathan Lam
  • 16,831
  • 17
  • 68
  • 94