-1

This is codes:

<!-- message --> 
<div><b><font size="6"><font color="Red">Bilim ve Teknik dergisi Mayıs 2019 Sayısı Pdf</font></font></b><br />
<br />
<img src="https://scontent-dus1-1.xx.fbcdn.net/v/t1.0-9/59069640_871111339894885_8805863518755618816_n.jpg?_nc_cat=109&amp;_nc_ht=scontent-dus1-1.xx&amp;oh=2a71d0bc34cda6b45404c30624c75046&amp;oe=5D6C1B30" border="0" alt="" /><br />
<br />
<b><font size="5"><a href="https://yadi.sk/i/oMnXUgBtTqKopg?fbclid=IwAR3KPXInlWCKFXuTKP1AU1VQGdsgvcDLdV9Px6YGOn3aU1tqAFz4Zo2J6PY" target="_blank">https://yadi.sk/i/oMnXUgBtTqKopg?fbc...1tqAFz4Zo2J6PY</a></font></b></div>
<!-- / message -->

How can I get between <!-- message --> and <!-- message --> ?

I'm using Python 3 and BeautifulSoup4. The following code is produce empty mess value:

tl="58421"
topLink="https://www.eskikitaplarim.com/showthread.php?t="+tl
page=s.get(topLink)
psoup=bs(page.text,'html.parser')
mess=psoup.find_all(text=re.compile("<!-- message -->(.*?)<!-- \/ message -->"))
print(mess)
Emma
  • 27,428
  • 11
  • 44
  • 69
  • I'm not sure why this is marked as duplicate. The user is using BeautifulSoup, so it seems they may want to do more than **just** get the `div` between the comments. So using straight regular expression may not be desirable, as they may want to do additional parsing. As the question is now locked from posting new answers, here's one that works with BeautifulSoup. https://pastebin.com/kn4BtYpi. – facelessuser Jun 03 '19 at 04:03

1 Answers1

0

Here, we can get our desired output using expressions that would pass new lines such as:

<!-- message -->([\s\S]*)<!-- \/ message -->

Demo 1

or:

<!-- message -->([\d\D]*)<!-- \/ message -->
<!-- message -->([\w\W]*)<!-- \/ message -->

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"<!-- message -->([\s\S]*)<!-- \/ message -->"

test_str = ("<!-- message --> \n"
    "<div><b><font size=\"6\"><font color=\"Red\">Bilim ve Teknik dergisi Mayıs 2019 Sayısı Pdf</font></font></b><br />\n"
    "<br />\n"
    "<img src=\"https://scontent-dus1-1.xx.fbcdn.net/v/t1.0-9/59069640_871111339894885_8805863518755618816_n.jpg?_nc_cat=109&amp;_nc_ht=scontent-dus1-1.xx&amp;oh=2a71d0bc34cda6b45404c30624c75046&amp;oe=5D6C1B30\" border=\"0\" alt=\"\" /><br />\n"
    "<br />\n"
    "<b><font size=\"5\"><a href=\"https://yadi.sk/i/oMnXUgBtTqKopg?fbclid=IwAR3KPXInlWCKFXuTKP1AU1VQGdsgvcDLdV9Px6YGOn3aU1tqAFz4Zo2J6PY\" target=\"_blank\">https://yadi.sk/i/oMnXUgBtTqKopg?fbc...1tqAFz4Zo2J6PY</a></font></b></div>\n"
    "<!-- / message -->")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Emma
  • 27,428
  • 11
  • 44
  • 69