I want to use REGEX to target the CITE tag in two different ways, depending on if it is in a URL or not:
- Either remove the tag if it is in a URL, starting with www or http(s);
- Leave the tag in tact if it is not in an URL
This is the string I want to operate on:
www.os<cite>map</cite>s.ordnancesurvey.co.uk/os<cite>map</cite>s/
and the normal text <cite>map</cite> here and again <cite>map</cite> here
http://os<cite>map</cite>s.ordnancesurvey.co.uk/os<cite>map</cite>s/
and the normal text <cite>map</cite> here and again <cite>map</cite> here
I have been using this expression:
$this_record = preg_replace('/((www.)|(https?:\/\/))([^\s]*?)(<cite>([^\s]*?)<\/cite>)([^\s]*)/', '$2$3$4$6$7', $this_record);
This works, but only for the the FIRST set of tags and results in:
www.osmaps.ordnancesurvey.co.uk/os<cite>map</cite>s/
and the normal text <cite>map</cite> here and again <cite>map</cite> here
http://osmaps.ordnancesurvey.co.uk/os<cite>map</cite>s/
and the normal text <cite>map</cite> here and again <cite>map</cite> here
Only the first set of tags are removed in the URLs. How would I remove subsequent ones?
Many thanks