0

I fetched some html with file_get_contents put it in a string. Now, I am trying to search inside it. I think I should search it using a wildcard but I couldn't make it work.

The html I want to search looks like ">2</td> and I want to use a wildcard in the number bit.

I think I should use regex but I couldn't figure out how to use it.


My trial looks like:

if (preg_match('/>(\w+)</td>/', $content, $matches)) {
    echo $matches[1];
}

How can I add character limit of 1 to the regex, so I search only up to 3 characters long and only form of integer for the wildcard?


Or is there any better way other than regex; for looking for that html strings and put them inside an array?

Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
senty
  • 12,385
  • 28
  • 130
  • 260
  • I want to search for 2 digits as well actually – senty Dec 17 '16 at 08:24
  • 3
    You know, there is DOM and XPath to search in HTML docs, do you? – Gordon Dec 17 '16 at 08:24
  • I'm trying to get the information from another site. I used file_get_contents. Can I use your way? What do you think is the best way to achieve what I want. The number is 1-200, and I'm trying to search the html to find each one. But the classes are structured weirdly, thus I need to search this way (i think) – senty Dec 17 '16 at 08:25
  • Have a look at http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php – Gordon Dec 17 '16 at 08:27
  • No. To be clear, I want to get the 1, not the class' 2 – senty Dec 17 '16 at 08:33
  • @hwnd it changes between `p0, p1, p2, n0, n1, n2`. I think the site's owner made it to give them styling. They are all over the html. But searching this: `n1">` type, I get what I want – senty Dec 17 '16 at 08:38
  • I'm a bit confused. How should I retrieve the html in this case? Is file_get_contents approach still okay? – senty Dec 17 '16 at 08:48

1 Answers1

1

You need two change two things. Replace the \w to \d to only allow digits instead of every word-character. Second, replace + with {1,3}. The last says that one up to three digits are required. Your complete statement will then be:

if (preg_match('/>(\d{1,3})</td>/', $content, $matches)) {
    echo $matches[1];
}
  • I made it work, and it works for $matches[1], but it doesn't return $matches[2] or [3] :/ what may be the problem? – senty Dec 17 '16 at 09:20
  • From the manual: If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on. – Richard Brinkman Dec 17 '16 at 09:33
  • 1
    You only have one subpattern so you only have $matches[0] and $matches[1]. Have a look at `preg_match_all` to find them all instead of the first one. – Richard Brinkman Dec 17 '16 at 09:35