2

I am trying to get the content from this paragraph but regex I am using is not working when I remove line brake from the paragraph then regex work otherwise it is not working please tell me what do I do?

Here is the paragraph:

<span class="st">My Paragraph - you can download free <b>drivers</b> for audio, video, chipset, Wi
Fi or USB, or a <b>driver</b> installation pack for <b>notebook</b>/(for&nbsp;...</span><br></div>

My Regex:

preg_match_all('/<span class="st">(.+?[^\n])<\/span><br><\/div>/i', $file_strings, $ti);

When I use this paragraph it works

<span class="st">My Paragraph - you can download free <b>drivers</b> for audio, video, chipset, WiFi or USB, or a <b>driver</b> installation pack for <b>notebook</b>/(for&nbsp;...</span><br></div>

Output should look like this

My Paragraph - you can download free <b>drivers</b> for audio, video, chipset, WFi or USB, or a <b>driver</b> installation pack for <b>notebook</b>/(for&nbsp;...

as you can see I just removed line brake from wifi and it's working but I need regex which works without removing that line brake

See this screenshot where I am testing my regex Screenshot I am testing it here Regex tester

Solution By: @jonny-5

Adding iS instead of i after forward slash solved the problem

 preg_match_all('/<span class="st">(.+?[^\n])<\/span><br><\/div>/is', $file_strings, $ti);
Community
  • 1
  • 1
  • What language are you doing this in? And I would use a parser instead. – hwnd May 26 '14 at 17:10
  • 1
    possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – The Paramagnetic Croissant May 26 '14 at 17:11
  • I am using the lang PHP – user3675088 May 26 '14 at 17:11
  • It's not the duplicate @user3477950 – user3675088 May 26 '14 at 17:15
  • 1
    @user3675088 it exactly is a duplicate. You are trying to parse HTML with regular expressions just like that guy. The solution is the same too: don't. Use an XML and/or HTML parser for parsing HTML. – The Paramagnetic Croissant May 26 '14 at 17:16
  • @user3477950 Brother I am using Php not HTML I just need correction in my regex that's it – user3675088 May 26 '14 at 17:19
  • Why is there an ending `` but not a starting `
    ` tag?
    – hwnd May 26 '14 at 17:20
  • 1
    @user3675088 The code you have in the question is HTML. You are trying to parse it. With a regular expression. End of story. – The Paramagnetic Croissant May 26 '14 at 17:20
  • @hwnd because it's just an example – user3675088 May 26 '14 at 17:22
  • 2
    To make the dot also match newlines, need to use the `s` (PCRE_DOTALL) [modifier](http://php.net/manual/en/reference.pcre.pattern.modifiers.php). Put it in pattern at the start `(?is) – Jonny 5 May 26 '14 at 17:23
  • @user3477950 Now see the code I updated – user3675088 May 26 '14 at 17:24
  • 1
    It doesn't work because `.*?` matches any character except '\n'. So if you have a line break, it won't work. – miindlek May 26 '14 at 17:34
  • @hwnd brother see this screenshot where I am testing my regex [link](http://s12.postimg.org/v1ava8vdp/screenshot.jpg) I am testing it here [link](http://www.phpliveregex.com/) – user3675088 May 26 '14 at 17:38
  • **Don't use regular expressions to parse HTML. Use a proper HTML parsing module.** You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php or [this SO thread](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. – Andy Lester May 26 '14 at 20:19

2 Answers2

1

I can see that you are trying to parse through an html file to get some value, you should use an html parsing tool to do this instead of a regular expression. Example: 'beautifulsoup' in python

just.another.programmer
  • 8,579
  • 8
  • 51
  • 90
  • @StephenOstermiller Er... an answer without a link can't ever be a link-only answer. Actually, I think this is a pretty decent answer, and although including an example would make it a lot better, I don't think it's bad as it is. –  May 26 '14 at 20:35
  • While this product recommendation may answer the question, it is better to include more information here. – Stephen Ostermiller May 26 '14 at 21:03
0

With all the disclaimers about using regex to parse html, here is a compact regex that matches your paragraph (see the online demo):

(?s)<span[^>]*>\K.*?.(?=</span>)

So in a preg_match_all, you would have something like:

$regex = "~(?s)<span[^>]*>\K.*?.(?=</span>)~";
$count = preg_match_all($regex,$string,$matches); //optional: ,SET_PATTERN_ORDER

How does it work?

  1. After matching the opening span tag, the \K drops it from the match to be returned.
  2. The .*?. matches all characters up too...
  3. A position where the (?=</span>) lookahead can assert that what follows is a closing span tag.
zx81
  • 41,100
  • 9
  • 89
  • 105