0

A while ago i have written a small php script to get the information from MySQL and put it in XML.Now one the description of the item in the MySQL was up untill now something like

<p>
 the item description ...etc.</p>

so far so good, i used the following :

preg_match('#<p>(.*?)</p>#s',$stro, $disp);

and that worked fine as expected.

now today the admin has added the new items in the database like this

<p style="font-family: Tahoma; font-size: 13px; line-height: 19.5px;">
 item description...etc</p>

and now my "trick" above doesnt work

now i tried (found this on Stackoverflow)

 //first line should strip the "style" part to only

$kulaka = preg_replace('/(<[^>]+) style=".*?"/i', '$1', $stro); 

 // and here we should remove  the p tag

 preg_match('#<p>(.*?)</p>#s',$kulaka, $disp);

it 'almost' works but it gives me

 "style=font-family: Tahoma; font-size: 13px; line-height: 19.5px;> item         
 desctiption "

any suggestions are welcomed,and i want to do it generally for all styles atributes not only for this particular one, as the Admin can change the size or font etc

user2557930
  • 319
  • 2
  • 11
  • 1
    It's not a good idea to use regex to parse HTML. Not only is it imprecise, it [summons unholy things](http://stackoverflow.com/a/1732454/2370483) – Machavity Feb 11 '15 at 17:52
  • ..and that is not XML or even xhtml any more - so you can't reliably use a DOM parser (http://php.net/manual/en/book.dom.php) – symcbean Feb 11 '15 at 20:51

1 Answers1

0

This will drop the <p> starting tag (and any attributes within it) as well as the closing tag:

$stro = '<p style="font-family: Tahoma; font-size: 13px; line-height: 19.5px;">'
    . 'item description...etc</p>';

preg_match('#^<p.*?>(.*)</p>$#is', $stro, $disp);

echo $disp[1] . PHP_EOL;

Output:

item description...etc

It is not totally solid, as it would fail if there would be any paragraph attribute present with an > in their value, but it may be enough for you in this case.

mhall
  • 3,671
  • 3
  • 23
  • 35
  • hi, I think its almost there , the problem is that my string is actually 2 such blockes in a row like '

    . 'item description...etc

    . 'second item description...etc

    ' and the script above removes succesdfully first

    but the ouput then is 'item description...etc

    . 'second item description...etc , and it shoud be just 'item description...etc'

    – user2557930 Feb 11 '15 at 22:22
  • using your tip though I managed to make it work with some tuning, that was exactly what I was looking for thanks !!!! – user2557930 Feb 11 '15 at 22:51