0

Possible Duplicate:
Robust, Mature HTML Parser for PHP

I'm trying to grab the first sentence of a string and the first image html instance.

$description = preg_split('/<img/', $item->description,null,PREG_SPLIT_DELIM_CAPTURE);

I'm able to returns an array but it's removing the <img from it's values which is needed. I've tried using flags but can't get the return I'm looking for which need to include the delimiter itself. I know to grab the first sentence I should be able to split by period or &nbsp;

String:

<p>First sentence here comes.&nbsp; Second sentence here it is.&nbsp; One more sentence.&nbsp;&nbsp;</p> <img alt="amj" src="https://domain.com/images7.jpg" /> <img alt="Ea" src="http://domain.com/images3.jpg" /> <img alt="amj" src="https://domain.com/images7.jpg" /> <img alt="amj" src="https://domain.com/images7.jpg" />
Community
  • 1
  • 1
Codex73
  • 5,690
  • 11
  • 56
  • 76

3 Answers3

0

Getting the first sentence is pretty simple. You just have to use a mixture of strpos and substr as shown below. As for getting the first image tag, you can do that with the preg_match expression.

$first_sentence = substr($item->description, 0, strpos($item->description, ))
Garrett Hyde
  • 5,409
  • 8
  • 49
  • 55
Joseph Crawford
  • 1,470
  • 1
  • 15
  • 29
0

1) 1st sentence

echo substr($item->description, 0, strpos('.', $item->description));

2) img

preg_match('#<img[^>]*>#',$item->description , $img);
echo $img[0];
mychalvlcek
  • 3,956
  • 1
  • 19
  • 34
0

If you make use of PREG_SPLIT_DELIM_CAPTURE you need to provide a capture within the regular expression pattern used with preg_split.

In your current pattern:

/<img/

There is mothing to capture, that is why you see it removed (Demo):

Array
(
    [0] => <p>First sentence here comes.&nbsp; Second sentence here it is.&nbsp; One more sentence.&nbsp;&nbsp;</p> 
    [1] =>  alt="amj" src="https://domain.com/images7.jpg" /> 
    [2] =>  alt="Ea" src="http://domain.com/images3.jpg" /> 
    [3] =>  alt="amj" src="https://domain.com/images7.jpg" /> 
    [4] =>  alt="amj" src="https://domain.com/images7.jpg" />
)

However, if you create a capture out of it, it will be captured:

/(<img)/

Result (Demo):

Array
(
    [0] => <p>First sentence here comes.&nbsp; Second sentence here it is.&nbsp; One more sentence.&nbsp;&nbsp;</p> 
    [1] => <img
    [2] =>  alt="amj" src="https://domain.com/images7.jpg" /> 
    [3] => <img
    [4] =>  alt="Ea" src="http://domain.com/images3.jpg" /> 
    [5] => <img
    [6] =>  alt="amj" src="https://domain.com/images7.jpg" /> 
    [7] => <img
    [8] =>  alt="amj" src="https://domain.com/images7.jpg" />
)

As you can see, preg_split does it's documented job and will add another split per each capture of the first capturing supgroup (it will only take the first). You then might need to extend it across the full tag, which has been outline in different other html-like-string-regex questions, for example (limited as usual with regular expressions, so blame that you use preg_* functions instead of a HTML parser if you run into issues, not the pattern itself:

/(<img [^>]*>)/

Result (Demo):

Array
(
    [0] => <p>First sentence here comes.&nbsp; Second sentence here it is.&nbsp; One more sentence.&nbsp;&nbsp;</p> 
    [1] => <img alt="amj" src="https://domain.com/images7.jpg" />
    [2] =>  
    [3] => <img alt="Ea" src="http://domain.com/images3.jpg" />
    [4] =>  
    [5] => <img alt="amj" src="https://domain.com/images7.jpg" />
    [6] =>  
    [7] => <img alt="amj" src="https://domain.com/images7.jpg" />
    [8] => 
)

You would make your code more stable by using a standard HTML parser.

hakre
  • 193,403
  • 52
  • 435
  • 836