Matching a multiple lines pattern via PHP's preg_match()

Question

How can I match subject via a PHP preg_match() regular expression pattern in this HTML code:

      <table border=0>
  <tr>
  <td>


  <h2>subject</h2>



    </td>

All the whitespaces and newlines are left on purpose. So the problem is in extracting subject name using some multiple line pattern.

This article may useful [multiline-searches-with-preg_match-in](https://blog-en.openalfa.com/multiline-searches-with-preg_match-in-php) — Ali Yousefi, May 04 '20 at 04:33

score 67 · Answer 1 · answered Jan 22 '12 at 04:38

If you're looking for (e.g.) a h2 tag nested within a td tag where there's only whitespace in between the two, just use \s which includes spaces, newlines, etc. eg::

preg_match('#<td>\s*<h2>(.*?)</h2>\s*</td>#i',$str,$matches);
// result is in $matches[1]

See it in action here.

For your interest, here is a list of different modifiers you can pass in to preg_* functions. Flags that may interest you are:

s ("dotall") : this one makes . match every character, including newlines. So, say your <h2>.....</h2> was spread over multiple lines. Then you'd have to do
```
preg_match('#<td>\s*<h2>(.*?)</h2>\s*</td>#is',$str,$matches);
```
in order to have the .* go over multiple lines (see the extra s at the end of the regex?).
m ("multiline") : this one just lets ^ and $ match start/end of line instead of just the start/end of string. You only really need it if you're using ^ and $ in your pattern and want them to match the start/end of each individual line in your input.

Man... I was stuck for 1.5 hour until I found your post. Thanks! The "s" attribute is what I was looking for. — codemonkey613, Aug 06 '13 at 19:19
I was wanting to match the start of a particular line in multi-line input, so `'/^start/im'` — Derek Illchuk, Apr 25 '15 at 14:58

score 15 · Answer 2 · answered May 25 '13 at 22:07

15

You can add the m operator to your regular expression:

// Given your HTML content.
$html = 'Your HTML content';
preg_match('/<td[^>]*>(.*?)<\/td>/im', $html, $matches);

Hope this (still) helps, hahaha.

answered May 25 '13 at 22:07

Saul Martínez

920
13
28

8

I think the `s` modifier (for "DOTALL" or "single-line" mode) is what you're thinking of, and that's already been [suggested](http://stackoverflow.com/a/8959000/20938). – Alan Moore May 26 '13 at 06:12
7

This hahaha is very disturbing. – Ch3shire Jan 18 '18 at 13:34
2

try also adding "sU" in addition to "m" if needed – E Ciotti Mar 15 '19 at 17:50

score 4 · Answer 3 · edited Jul 04 '19 at 16:38

4

You shouldn't use regex to parse HTML content. It can cause a lot of issues if you cannot control what the user can input. There are a lot of better solutions in every language. An XML parser in most of the cases is doing a better job. Check out DOMDocument, simplehtmldom or php-html-parser

See here for more answers why you shouldn't use regex on HTML content: RegEx match open tags except XHTML self-contained tags

edited Jul 04 '19 at 16:38

Peter Mortensen

30,738
21
105
131

answered Apr 25 '16 at 12:57

Maciej Paprocki

1,230
20
29

1

I was looking for this answer. I was surprised that 5 years later nobody suggested that maybe it's a bad idea to parse html with regex. Don't understand why it's downvoted. – s3v3n Dec 06 '16 at 14:34
Yep, welcome to the club. I still stand by my answer, though :) – Maciej Paprocki Dec 06 '16 at 16:03
1

This is definitely the way to approach this. Gave it another upvote at least :-) – Marty Jan 07 '17 at 01:56
4

I haven’t voted on this, but I might add that it misses the point of the question, which is how to use `preg_match` with multiple lines. It is _not_ answering the question if you don’t like the use case. – Manngo Sep 13 '19 at 02:10
Hmm. I think I am offering better solutions than one provided. If someone uses the wrong tool shouldn't I tell them they do and offer better alternatives? – Maciej Paprocki Sep 16 '19 at 14:58

score 3 · Answer 4 · answered Jan 22 '12 at 02:18

3

Very simply with

preg_match('/<h2>(.*?)<\\/h2>/', $str, $matches);
print($matches[1]);

The multi-line format has no effect on the regex unless you need to match a string that spans multiple lines.

answered Jan 22 '12 at 02:18

Borodin

126,100
9
70
144

Sorry I should have been more specific. The problem is in the lack of "identifiers" in the HTML code i am dealing with. There can be some other h2 tags and others. So i am trying to use the surrounding tags to exactly target this particular place in the code. So how can i make regex patterns understand multilines?... – Dmitriy Ryabinin Jan 22 '12 at 02:31

NVRM · Answer 5 · 2020-06-10T18:51:07.567

0

Catch a block of code separated by 4 four backticks (as the markdown syntax).

Example to be adapted easily.

<?php

$str = '
# Some Text

```` 
    h5 {
      font-size: 1rem;
      font-weight: 600;
    }
````

And some text.
';

$reg = '/````[^>]*(.*?)````/';

preg_match($reg, $str, $matches);
echo $matches[0];

/* OUTPUT
```` 
    h5 {
      font-size: 1rem;
      font-weight: 600;
    }
````
*/

echo preg_replace($reg, "DELETED", $str);

/* OUTPUT
# Some Text

DELETED

And some text.
*/

edited Jun 10 '20 at 18:51

answered Jun 10 '20 at 18:47

NVRM

11,480
1
88
87

What question are you answering? – Toto Jun 10 '20 at 18:50
Matching a multiple lines pattern via PHP's preg_match() – NVRM Jun 10 '20 at 18:51

score -5 · Answer 6 · edited Jul 04 '19 at 16:41

You have to remove all line breaks using \s in the regular expression:

$str ="<ol>
         <li>Capable for unlimited product</li>
         <li>Two currency support</li>
         <li>Works with touch screens and click screen based systems</li>
         <li>Responsive design <b>shopping cart</b>, Specially design for Mac, iPhone, iPad, PC and Android</li>
         <li>VAT for countries that support a Value Added Tax</li>
         <li>Barcode scanner checkout option for POS</li>
         <li>mRSS</li>
       </ol>";

preg_match("/^([A-Za-z0-9\s\<\>\.\,\/\-\ ]+)$/", $str);

// Sanitize your code before save to database.

function test_input($data) {
    $data = trim($data);
    $data = htmlspecialchars($data);
    $data = json_encode($data);
    $data = addslashes($data);
    return $data;
}

echo test_input($str);

I think he want's to preserve new lines – Maciej Paprocki Apr 20 '16 at 09:45 — Maciej Paprocki, Apr 20 '16 at 09:45

Matching a multiple lines pattern via PHP's preg_match()

6 Answers6

Linked

Related