0

Im looking to find all the occurances of Php on a page (ignoring case) with BeautifulSoup in Python3

Php (regardless of case) could occur anywhere on the page, so I am trying to basically just find the string representation, and not within a specific div, or class.

I currently have:

from BeautifulSoup import BeautifulSoup
import requests
    school_urls = ['somesite1.com','somesite2.com']
    posting_keywords = ['PHP', 'Php', 'php']

    for school in school_urls:

school contains html markup from requesting a url with words like php in it.

How does this look to you? Is there a way to do this in Beautiful soup to find all variations of php ignoring the case instead of having to loop through posting_keywords?

Thanks

Jshee
  • 2,620
  • 6
  • 44
  • 60

2 Answers2

0

Does posting_keywords.lower() work for you.

thinkvitamin
  • 81
  • 1
  • 12
0
import re, bs4
text = '''"""
<html><head><title>The Dormouse's story php</title></head>
<body>
<p class="title"><b>The Dormouse's story PHP</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">php</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Php</a> and
<a href="http://example.com/tillie" class="sister" id="link3">php Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""'''
soup = bs4.BeautifulSoup(text, 'lxml')
soup.find_all(text=re.compile(r'php', re.IGNORECASE))

out:

["The Dormouse's story php",
 "The Dormouse's story PHP",
 'php',
 'Php',
 'php Tillie']

Document

宏杰李
  • 11,820
  • 2
  • 28
  • 35