I have a database with HTML content and it has some text with links. Some texts have hash symbol in their URLs, some others no.
I need to delete the links with hash symbol, keeping those with no hash symbol on it.
Example:
Input:
<a href="http://example.com/books/1">The Lord of the Rings</a>
<ul>
<li><a href="http://example.com/books/1#c1" >Chapter 1</a></li>
<li><a name="name before href" href="http://example.com/books/1#c2">Chapter 2</a></li>
<li><a href="http://example.com/books/1#c3" name="name after href">Chapter 3</a></li>
<li><a href="http://example.com/books/1#cN" target="_blank">Chapter N</a></li>
</ul>
<br><br>
<a href="http://example.com/books/1">Harry Potter</a>
<ul>
<li><a href="http://example.com/books/2#c1" target="_self">Chapter 1</a></li>
<li><a href="http://example.com/books/2#c2" name="some have name" title="some others have title" >Chapter 2</a></li>
<li><a href="http://example.com/books/2#c3">Chapter 3</a></li>
<li><a href="http://example.com/books/2#cN" >Chapter N</a></li>
</ul>
Desired Output:
<a href="http://example.com/books/1">The Lord of the Rings</a>
<ul>
<li>Chapter 1</li>
<li>Chapter 2</li>
<li>Chapter 3</li>
<li>Chapter N</li>
</ul>
<br><br>
<a href="http://example.com/books/2">Harry Potter</a>
<ul>
<li>Chapter 1</li>
<li>Chapter 2</li>
<li>Chapter 3</li>
<li>Chapter N</li>
</ul>
I am trying with this code, but it delete all the links and I want to keep those with no hash symbol.
$content = preg_replace('#<a.*?>([^>]*)</a>#i', '$1', $content);
So, currently I am getting this:
The Lord of the Rings
<ul>
<li>Chapter 1</li>
<li>Chapter 2</li>
<li>Chapter 3</li>
<li>Chapter N</li>
</ul>
<br><br>
Harry Potter
<ul>
<li>Chapter 1</li>
<li>Chapter 2</li>
<li>Chapter 3</li>
<li>Chapter N</li>
</ul>
More details:
- I am using PHP.
- The only reference I have to know what links to delete is de # symbol.
- Some links have new line.
Example:
<a href="http://example.com">
new line</a>
or
<a href="http://example.com">new
line</a>