I'm trying to match <html>
tag with optional attributes and to extract those attributes. I want to match one of the following variations of <html>
tag. It would be the starting content of a HTML document or there may be DOCTYPE
declaration before <html>
.
<html>
<html lang="en">
<html class="my-class">
<html class="my-class" lang="en">
The regular expression pattern I'm trying is as below, but it is only matching the last attribute lang="en"
for the fourth case.
/<html(\s+([a-z\-]+)=('|")([^"'>]*)('|"))*>/i
Demo
I know that some suggest to use DOM parser instead of regular expression. But I think regular expression is enough for my case as I want to match <html>
tag only.