regex The post appeared first on

Question

I imported some posts to my site from RSS but at the end of post this line appears - This Post Appeared First On This site.

<p>The post <a rel="nofollow" href="link">title</a> appeared first on <a rel="nofollow" href="Website.com"">Website</a>.</p>

however, my removal code doesn't work

preg_replace('/<p>The post <a\s+.*?href=".*?"\s+.*?>.*?<\/a> appeared first on <a\s+.*?href=".*?"\s+.*?>.*?<\/a>.</p>/i', '', $text);

hope someone can help me

Don't use regexs for this. Use a parser. `` should be throwing an error though so likely you are not using error reporting correctly. — user3783243, Dec 24 '19 at 13:46

score 0 · Answer 1 · answered Dec 24 '19 at 15:34

I agree with the comments above, don't use regexes to parse HTML or XML strings, they're not the tools for the job. However, if you must, your original regex has two problems:

You didn't escape the </p> (as User3783243 mentioned). It needs to be <\/p> in the regex.
The regex requires a whitespace after the href="" attribute, which is not present in the example. You should probably remove the \s+ after the second " in the href.

If you add them in, the regex matches the provided string see here: https://regex101.com/r/MDwSua/1

The quantifier could be changed from `+` to `*` and then the whitespace is optional after the `href`.. but this is getting into writing a parser. — user3783243, Dec 24 '19 at 16:54

score 0 · Answer 2 · answered Dec 24 '19 at 16:05

0

This should work:

$regex = '/\<p\>The post \<a[^>]*\>title\<\/a\> appeared first on \<a[^>]*\>Website\<\/a\>.\<\/p\>/';

preg_replace($regex, '', $text);

The pattern [^>]* captures the attributes of a tag.

answered Dec 24 '19 at 16:05

codeit

111
8

regex The post appeared first on

2 Answers2