Get a string inside the pattern using PHP

Question

I have an html file and I want to get all the classes from this file in an array using PHP. For example this is my html file:

<div class="main menu">element</div>
<div class="content"></div>

I want to get an array with three elements (in this particular example): "main", "menu", "content".

In bash it is possible to use grep to accomplish this:

classes=($(grep -oP '(?<=class=").*?(?=")' "./index.html"))

How can I do the same in PHP?

I have this basic code at this moment:

//read the entire string
$str = implode("", file('./index.html'));
$fp = fopen('./index.html', 'w');
//Here I guess should be the function to get all of the strings
//now, save the file
fwrite($fp, $str, strlen($str));

Edit: How can my question be the duplicate of the one provided, if I am asking on how find the string using PHP? It is not bash and I have already provided the grep alternative.

possible duplicate of [Regular expression for finding class names in HTML](http://stackoverflow.com/questions/1989579/regular-expression-for-finding-class-names-in-html) — RisingSun, Aug 17 '15 at 04:25
Do you want `main menu`, `content` or three matches: `main`, `menu`, `content` ? — Jonny 5, Aug 17 '15 at 04:35

But those new buttons though.. · Answer 1 · 2015-08-17T05:04:13.173

4

I would use php's DOMDocument() class like this:

$classes = array();
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTMLFile('./index.html');
$elements = $dom->getElementsByTagName('*');
foreach($elements as $element) {
    $classes = array_merge($classes,array_filter(explode(' ',$element->getAttribute('class'))));
}
print_r($classes);

Explanation:

declare empty array $classes
turn off errors DOMDocument might throw if it's incomplete or invalid html
instantiate new DOMDocument object
load file index.html into DOMDocument
get all elements using wildcard tagname
iterate over elements
get classname
explode classname by whitespace
filter exploded array to remove empty values
add result to $classes array

edited Aug 17 '15 at 05:04

answered Aug 17 '15 at 04:39

But those new buttons though..

21,377
10
81
108

2

I agree, generally regex is not the appropriate means for parsing html :] depends if parsing own or arbitrary html and what going to achieve imho. – Jonny 5 Aug 17 '15 at 04:53
1

Well, thank you for the answer. I have made a research and found it pretty useful. I will explore this topic more. Due to the reason that I have already selected the correct answer, the only thing I can do is to up vote you. Thank you. – Alex Aug 17 '15 at 05:05

Jonny 5 · Accepted Answer · 2015-08-17T05:18:28.633

4

To get the three elements, try regex like this with preg_match_all function:

(?:class="|\G(?!^))\s*\K[^\s"]+

\G continues at end of the previous match or start
\K resets beginning of the reported match

See test at eval.in

if(preg_match_all('/(?:class="|\G(?!^))\s*\K[^\s"]+/', $str, $out) > 0)
  print_r($out[0]);

Array ( [0] => main [1] => menu [2] => content )

Note that generally regex is not the appropriate means for parsing html. depends if parsing own or arbitrary html and what going to achieve imho.

edited Aug 17 '15 at 05:18

answered Aug 17 '15 at 04:41

Jonny 5

12,171
2
25
42

parsing html with regex is foolish imo. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454. Use DOMDocument or you may anger the gods – But those new buttons though.. Aug 17 '15 at 04:51
thanks but there are many regexers much bettter, you only would've needed to add tag 'regex' :] Also regex is generally not recommended for parsing arbitrary html. – Jonny 5 Aug 17 '15 at 04:51
1

What is the difference if it is html or not? I am editing it as a text file. I will preform the similar operations and with the files that have other extensions. – Alex Aug 17 '15 at 04:54
@Alex - you asked about getting classnames from an html document, which is parsing html. there are tools for this - namely `DOMDocument`. Learn it, use it. Read the link i posted above - regex is great, but not the right tool for the job. – But those new buttons though.. Aug 17 '15 at 04:57

score 1 · Answer 3 · answered Aug 17 '15 at 04:26

1

Depending on what you're trying to do, you can either use regular expressions using the preg_grep function, or you could traverse the DOM using the DOMDocument class.

answered Aug 17 '15 at 04:26

Jesse Weigert

4,714
5
28
37

Get a string inside the pattern using PHP

3 Answers3