0

I want to remove string like below from a html code <span style="font-size: 0.8px; letter-spacing: -0.8px; color: #ecf6f6">3</span>

so I came up with regex.

$pattern = "/<span style=\"font-size: \\d(\\.\\d)?px; letter-spacing: -\\d(\\.\\d)?px; color: #\\w{6}\">\\w\\w?</span>/um";

However, regex doesn’t work. Can someone point me what i did wrong. I'm new to PHP.

when I tested with a simple regex, it works so problem remains with the regex.

  $str = $_POST["txtarea"];
  $pattern = $_POST["regex"];
  echo preg_replace($pattern, "", $str);
Tuan Anh Tran
  • 139
  • 3
  • 9

3 Answers3

0

As much as I would advocate DOMDocument to do the job here, you would still need some regular expression down the line, so ...

The expression for the px numeric value can be simply [\d.-]+, since you're not trying to validate anything.

The contents of the span can be simplified to [^<]* (i.e. anything but a opening bracket):

$re = '/<span style="font-size: [\d.-]+px; letter-spacing: [\d.-]+px; color: #[0-9a-f]{3,6}">[^<]*<\/span>/';

echo preg_replace($re, '', $str);
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
0

Do not use regex for this problem. Use an html parser. Here is a solution in python with BeautifulSoup, because I like this library for these tasks:

from BeautifulSoup import BeautifulSoup

with open('Path/to/file', 'r') as content_file:
    content = content_file.read()

soup = BeautifulSoup(content)
for div in soup.findAll('span', {'style':re.compile("font-size: \d(\.\d)?px; letter-spacing: -\d(\.\d)?px; color: #\w{6}")}):
    div.extract()

with open('Path/to/file.modified', 'w') as output_file:
    output_file.write(str(soup))
Community
  • 1
  • 1
000
  • 26,951
  • 10
  • 71
  • 101
0

you have a slash ( / ) in your ending tag ( closing span )

you need to escape it or to use a different delimiter than slash

Sly
  • 361
  • 2
  • 9