3

I'm trying to replace everything between 2 tags, but I'm not able to build the right expression.

This is what I did: /<tag>(.*|\n*)</tag>/

I want it to capture any characters including line breaks.

This is an example of what I need to capture:

 <tag>
       <div class="feed_title">Some Title</div>
       <div class="feed_content">Some text</div>
    </tag>

Can some of you tell me what I'm doing wrong?

Here is a link to RegExr and a full example of what the content looks like: http://regexr.com?2t4n1

Ben
  • 54,723
  • 49
  • 178
  • 224
Geoff Klimber
  • 33
  • 1
  • 4
  • Why don't you give us an example of what your tags/output looks like and what your exact preg_match call looks like. – ehudokai Feb 18 '11 at 00:16
  • I have no problems with the PHP, the problem I have is returning the content inside the tags. the thing is that the expression might be wrong because I'm not able to get the content, I'm testing the expressions on RegExr and it fails when there is a line break, and its necessary since the script is handling HTML code. – Geoff Klimber Feb 18 '11 at 00:32
  • Thank you very much Eyequem and Michael! the last edit by eyequem based on michael's answer did the job. thanks again, you did it! – Geoff Klimber Feb 18 '11 at 01:50

2 Answers2

3

Perhaps you meant to do this?

# The ... in this example is your text to match
preg_match("#<tag>(.*?)</tag>#s","...",$matches); 

Here is a link to an article on XML data extraction using regular expressions using PHP, which has some good examples.

Michael Goldshteyn
  • 71,784
  • 24
  • 131
  • 181
  • I'm testing the expression you gave me on RegExr and its not marking the code in the example thats because of the line brake character – Geoff Klimber Feb 18 '11 at 00:27
  • Sorry, I modified my answer per your update, but didn't change the surrounding reg-ex slashes to some other char (e.g., pound signs). Please, try with my edited answer. – Michael Goldshteyn Feb 18 '11 at 00:30
  • Be aware that in RegExr you don't use the delimiters (`#` in this case) or the trailing modifiers (the `s`). The regex should be simply `(.*?)`, and you have to select the `dotall` checkbox. – Alan Moore Feb 18 '11 at 02:43
3
'#.#m'

The m means MULTILINE, it makes the point able to match the newlines=line breaks \n

EDIT:

as it has been corrected by sharp eyes and good brain, it is evidently '#.+#s'

EDIT2:

As Michael Goldshteyn said, this should work

$ch = '<tag>\s+<div class="feed_title">Some Title</div>\s+<div class="feed_content">Some text</div>\s+</tag>'

preg_match('#<tag>(.+?)</tag>#s',$ch,$match)

There is another solution, without s flag, I think:

preg_match('#<tag>((.|\s)+?)</tag>#',$ch,$match)

But it's more complicated

.

EDIT 3:

I think that the presence of \s in $ch is a nonsense. \s is used in a RE, not in strings.

I wrote that because I was thinking that it could be blanks or \t that could be before <tag> and at the beginning of other lines

\t is written with an escape; that's not a reason to write \s also in a string

eyquem
  • 26,771
  • 7
  • 38
  • 46
  • 3
    No, it's the `s` (*single-line*, or *dot-matches-all*) modifier that allows the `.` metacharacter to match newlines. Multiline mode enhances the start and end anchors (`^` and `$`) so they match at line boundaries, too. – Alan Moore Feb 18 '11 at 00:36
  • can you give me an example applied to my end. I would appreciate it very much since I'm new to regular expressions – Geoff Klimber Feb 18 '11 at 00:37
  • ah !! such a shame for me. However , I had known that for a long time. I'm tired, excuse me. – eyquem Feb 18 '11 at 01:03
  • `(.|\s)+?` is not a viable substitute for `(.+?)`, as explained [here](http://stackoverflow.com/questions/2407870/javascript-regex-hangs-using-v8/2408599#2408599). If you can't use the `dotall` flag (eg, if you're using JavaScript, which doesn't support it), you're better off with something like `([\s\S]+?)`. If you just don't want the whole regex to be in `dotall` mode, you can localize its effect to just that group, like this: `(?s:.+?)` – Alan Moore Feb 18 '11 at 02:41
  • @Alan Moore Thank you. I didn't know the problem. Regexes are so trapping. I didn't plainly understood all the explanation in the linked page, I will study it more thoroughly. Thank you – eyquem Feb 18 '11 at 02:49