RegEx for with leading, trailing, linebreak</a></h1> </div> <div class="grid fw-wrap pb8 mb16 bb bc-black-075"> <div class="grid--cell ws-nowrap mr16 mb8" title="2016-01-12 19:07:53Z"> <span class="fc-light mr2">Asked</span> <time itemprop="dateCreated" datetime="2012-05-06T07:23:30.127" class="fromnow">May 06 '12 at 07:23</time> </div> <div class="grid--cell ws-nowrap mr16 mb8"> <span class="fc-light mr2">Active</span> <time class="fromnow" title="2012-05-06T07:39:21.160" datetime="2012-05-06T07:39:21.160">May 06 '12 at 07:39</a> </div> <div class="grid--cell ws-nowrap mb8" title="Viewed 82 times"> <span class="fc-light mr2">Viewed</span> 82 times </div> </div> <div id="mainbar" role="main" aria-label="questions and answers"> <div id="question" class="question" data-questionid="10468903" data-ownerid="1377738" data-score="0"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container grid jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="10468903"> <button class="js-vote-up-btn grid--cell s-btn s-btn__unset c-pointer"><svg aria-hidden="true" class="m0 svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 26h32L18 10 2 26z"></path></svg></button> <div class="js-vote-count grid--cell fc-black-500 fs-title grid fd-column ai-center" itemprop="upvoteCount" data-value="0">0</div> <button class="js-bookmark-btn s-btn s-btn__unset c-pointer py4"> <svg aria-hidden="true" class="svg-icon iconBookmark" width="18" height="18" viewBox="0 0 18 18"><path d="M6 1a2 2 0 00-2 2v14l5-4 5 4V3a2 2 0 00-2-2H6zm3.9 3.83h2.9l-2.35 1.7.9 2.77L9 7.59l-2.35 1.7.9-2.76-2.35-1.7h2.9L9 2.06l.9 2.77z"></path></svg> <div class="js-bookmark-count mt4" data-value=""></div> </button> </div> </div> <div class="postcell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"><p>Most website I can parse its title easily with RegEx "(.<em>)" or "\s</em>(.+?)\s*". However some sites have a bit different formatting, like <a class="external-link" href="http://www.youtube.com" rel="nofollow">http://www.youtube.com</a> (see below). The expression above does not work. Any help catching this kind of format and any other HTML formats?</p> <p>Thanks -Tim.</p> <pre><code><title> YouTube - Broadcast Yourself. </code></pre> <p></p></div> <div class="mt24 mb12"> <div class="post-taglist grid gs4 gsy fd-column"> <div class="grid ps-relative"> <a href="../../questions/tagged/html" class="post-tag js-gps-track" title="show questions tagged 'html'" rel="tag">html</a> <a href="../../questions/tagged/regex" class="post-tag js-gps-track" title="show questions tagged 'regex'" rel="tag">regex</a> </div> </div> </div> <div class="mb0"> <div class="mt16 grid gs8 gsy fw-wrap jc-end ai-start pt4 mb16"> <div class="grid--cell mr16 fl1 w96"></div> <div class="post-signature grid--cell"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="edited May 06 '12 at 07:39">edited May 06 '12 at 07:39</time> <a href="../../users/295264/starx" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/295264.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Starx" /> </a> <div class="s-user-card--info"> <a href="../../users/295264/starx" class="s-user-card--link">Starx</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">77,474</li> <li class="s-award-bling s-award-bling__gold" title="47 gold badges">47</li> <li class="s-award-bling s-award-bling__silver" title="185 silver badges">185</li> <li class="s-award-bling s-award-bling__bronze" title="261 bronze badges">261</li> </ul> </div> </div> </div> <div class="post-signature owner grid--cell"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked May 06 '12 at 07:23">asked May 06 '12 at 07:23</time> <a href="../../users/1377738/user1377738" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/1377738.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="user1377738" /> </a> <div class="s-user-card--info"> <a href="../../users/1377738/user1377738" class="s-user-card--link">user1377738</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">23</li> <li class="s-award-bling s-award-bling__bronze" title="2 bronze badges">2</li> </ul> </div> </div> </div> </div> </div> </div> <div class="post-layout--right js-post-comments-component"> <div id="comments-10468903" class="comments js-comments-container bt bc-black-075 mt12 " data-post-id="10468903" data-min-length="15"> <ul class="comments-list js-comments-list" data-remaining-comments-count="0" data-canpost="false" data-cansee="true" data-comments-unavailable="false" data-addlink-disabled="true"> <li id="comment-13523154" class="comment js-comment " data-comment-id="13523154" data-comment-owner-id="295264" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment13523154_10468903"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">@Pumbaa80, I didn't notice there was no language, LOL.</span> – <a href="../../users/295264/starx" title="77,474 reputation" class="comment-user ">Starx</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/10468903/regex-for-title-with-leading-trailing-linebreak#comment13523154_10468903"><span title="2012-05-06T07:31:16.517 License: CC BY-SA 3.0" class="relativetime-clean">May 06 '12 at 07:31</span></a></span> </div> </div> </li> <li id="comment-13523245" class="comment js-comment " data-comment-id="13523245" data-comment-owner-id="109696" data-comment-score="1"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> <span title="number of 'useful comment' votes received" class="warm">1</span> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment13523245_10468903"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">http://stackoverflow.com/a/1732454/109696</span> – <a href="../../users/109696/etienne-perot" title="4,764 reputation" class="comment-user ">Etienne Perot</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/10468903/regex-for-title-with-leading-trailing-linebreak#comment13523245_10468903"><span title="2012-05-06T07:44:08.093 License: CC BY-SA 3.0" class="relativetime-clean">May 06 '12 at 07:44</span></a></span> </div> </div> </li> <li id="comment-13523511" class="comment js-comment " data-comment-id="13523511" data-comment-owner-id="1377738" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment13523511_10468903"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">forgot to set the multiline option on the parser. Now everything works. Thanks all</span> – <a href="../../users/1377738/user1377738" title="23 reputation" class="comment-user owner">user1377738</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/10468903/regex-for-title-with-leading-trailing-linebreak#comment13523511_10468903"><span title="2012-05-06T08:19:22.330 License: CC BY-SA 3.0" class="relativetime-clean">May 06 '12 at 08:19</span></a></span> </div> </div> </li> </ul> </div> </div> </div> </div> <div id="answers"> <a name="tab-top"></a> <div id="answers-header"> <div class="answers-subheader grid ai-center mb8"> <div class="grid--cell fl1"> <h2 class="mb0" data-answercount="9">2 Answers<span style="display:none;" itemprop="answerCount">2</span></h2> </div> </div> </div> <a name="10468928"></a> <div id="answer-10468928" class="answer " data-answerid="10468928" data-ownerid="295264" data-score="2" itemprop="suggestedAnswer" itemscope="" itemtype="https://schema.org/Answer"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container grid jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="10468928"> <button class="js-vote-up-btn grid--cell s-btn s-btn__unset c-pointer"><svg aria-hidden="true" class="m0 svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 26h32L18 10 2 26z"></path></svg></button> <div class="js-vote-count grid--cell fc-black-500 fs-title grid fd-column ai-center" itemprop="upvoteCount" data-value="2">2</div> </div> </div> <div class="postcell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"><p>There are various ways to get this done. For only title, <a class="external-link" href="http://simplehtmldom.sourceforge.net/" rel="nofollow">SIMPLEHTMLDOM</a> is more than enough.</p> <pre><code>$html = file_get_html('http://www.youtube.com/'); $title = $html -> find("title") -> innerHTML; echo $title; </code></pre></div> <div class="mb0"> <div class="mt16 grid gs8 gsy fw-wrap jc-end ai-start pt4 mb16"> <div class="grid--cell mr16 fl1 w96"></div> <div class="post-signature grid--cell"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="answered May 06 '12 at 07:29">answered May 06 '12 at 07:29</time> <a href="../../users/295264/starx" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/295264.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Starx" /> </a> <div class="s-user-card--info"> <a href="../../users/295264/starx" class="s-user-card--link">Starx</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">77,474</li> <li class="s-award-bling s-award-bling__gold" title="47 gold badges">47</li> <li class="s-award-bling s-award-bling__silver" title="185 silver badges">185</li> <li class="s-award-bling s-award-bling__bronze" title="261 bronze badges">261</li> </ul> </div> </div> </div> </div> </div> </div> <div class="post-layout--right js-post-comments-component"> <div id="comments-10468928" class="comments js-comments-container bt bc-black-075 mt12 " data-post-id="10468928" data-min-length="15"> <ul class="comments-list js-comments-list" data-remaining-comments-count="0" data-canpost="false" data-cansee="true" data-comments-unavailable="false" data-addlink-disabled="true"> <li id="comment-13523146" class="comment js-comment " data-comment-id="13523146" data-comment-owner-id="17027" data-comment-score="2"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> <span title="number of 'useful comment' votes received" class="warm">2</span> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment13523146_10468928"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">+1. In other words: don't use regex for things it wasn't made for. Seriously, guys, it's not a hard rule.</span> – <a href="../../users/17027/mahmoud-al-qudsi" title="28,357 reputation" class="comment-user ">Mahmoud Al-Qudsi</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/10468903/regex-for-title-with-leading-trailing-linebreak#comment13523146_10468928"><span title="2012-05-06T07:29:42.570 License: CC BY-SA 3.0" class="relativetime-clean">May 06 '12 at 07:29</span></a></span> </div> </div> </li> </ul> </div> </div> </div> </div> <a name="10468957"></a> <div id="answer-10468957" class="answer accepted-answer" data-answerid="10468957" data-ownerid="1056693" data-score="0" itemprop="acceptedAnswer" itemscope="" itemtype="https://schema.org/Answer"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container grid jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="10468957"> <button class="js-vote-up-btn grid--cell s-btn s-btn__unset c-pointer"><svg aria-hidden="true" class="m0 svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 26h32L18 10 2 26z"></path></svg></button> <div class="js-vote-count grid--cell fc-black-500 fs-title grid fd-column ai-center" itemprop="upvoteCount" data-value="0">0</div> <div class="js-accepted-answer-indicator grid--cell fc-green-500 py6 mtn8"><div class="ta-center"><svg aria-hidden="true" class="svg-icon iconCheckmarkLg" width="36" height="36" viewBox="0 0 36 36"><path d="m6 14 8 8L30 6v8L14 30l-8-8v-8z"></path></svg></div></div> </div> </div> <div class="postcell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"><p>If you want to include the line break to the regular expression, in most cases you would only need to use the <code>\n</code> inside the expression. That said, which language/interpreter are you using? Some of them doesn't allow multiline expressions.</p> <p>If they are permitted, something like <code>(.|\n|\r)*</code> would suffice.</p> <p>In case your language or interpreter is not compatible to multiline regular expressions, you could always replace the newlines characters with spaces, and then pass the resulting string to the regular expression parser. That again also depends on your programming environment.</p> <p>Hope helped!</p></div> <div class="mb0"> <div class="mt16 grid gs8 gsy fw-wrap jc-end ai-start pt4 mb16"> <div class="grid--cell mr16 fl1 w96"></div> <div class="post-signature grid--cell"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="answered May 06 '12 at 07:38">answered May 06 '12 at 07:38</time> <a href="../../users/1056693/felixgaal" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/1056693.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="felixgaal" /> </a> <div class="s-user-card--info"> <a href="../../users/1056693/felixgaal" class="s-user-card--link">felixgaal</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">2,403</li> <li class="s-award-bling s-award-bling__silver" title="15 silver badges">15</li> <li class="s-award-bling s-award-bling__bronze" title="24 bronze badges">24</li> </ul> </div> </div> </div> </div> </div> </div> <div class="post-layout--right js-post-comments-component"> <div id="comments-10468957" class="comments js-comments-container bt bc-black-075 mt12 " data-post-id="10468957" data-min-length="15"> <ul class="comments-list js-comments-list" data-remaining-comments-count="0" data-canpost="false" data-cansee="true" data-comments-unavailable="false" data-addlink-disabled="true"> <li id="comment-13523257" class="comment js-comment " data-comment-id="13523257" data-comment-owner-id="109696" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment13523257_10468957"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">I like `[\S\s]` better than `(.|\n|\r)` to match any single character no matter what the `DOTALL` flag is set to, but maybe that's just me</span> – <a href="../../users/109696/etienne-perot" title="4,764 reputation" class="comment-user ">Etienne Perot</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/10468903/regex-for-title-with-leading-trailing-linebreak#comment13523257_10468957"><span title="2012-05-06T07:45:27.877 License: CC BY-SA 3.0" class="relativetime-clean">May 06 '12 at 07:45</span></a></span> </div> </div> </li> <li id="comment-13523505" class="comment js-comment " data-comment-id="13523505" data-comment-owner-id="1377738" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment13523505_10468957"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">Felix, thanks for reminding me to set the multi-line option. My original expression actually works when I set multi-line option, when this option is set.</span> – <a href="../../users/1377738/user1377738" title="23 reputation" class="comment-user owner">user1377738</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/10468903/regex-for-title-with-leading-trailing-linebreak#comment13523505_10468957"><span title="2012-05-06T08:18:29.053 License: CC BY-SA 3.0" class="relativetime-clean">May 06 '12 at 08:18</span></a></span> </div> </div> </li> </ul> </div> </div> </div> </div> </div> </div> </div> </div> <script src="../../static/js/stack-icons.js"></script> <script src="../../static/js/fromnow.js"></script> </body> </html>