Extracting something specific from BS4 after running a find_all

Question

I'm trying out some beginning web scraping using BS4. I went to a site called finviz.com since I have some stuff there I'd be interested in.

print(soup.find_all('a', class_ = 'screener-link-primary'))

Here's two lines of the output when I print the above^ . How would I extract "AGO" and "AGM" from this? I tried pasting the line of text as text but it stripped away all the html tags...so i pasted it as an image

Output Image

score 0 · Accepted Answer · answered Jun 29 '18 at 14:33

0

Use the .text property to get the text between the <a> and </a> tags.

sample_html = '''
<a class="screener-link-primary" href="aych-ref">AGM</a>
<a class="screener-link-primary" href="aych-ref">AGO</a>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(sample_html, 'html.parser')
links = soup.find_all('a', class_='screener-link-primary')
for link in links:
    print(link.text)

answered Jun 29 '18 at 14:33

BenG

304
1
11

Ok I will try that. Can I ask you another question, how does the for loop work here? I thought traditionally for loops in python worked with a range i.e: for x in range(y,z). how does that loop work without a range? – Jed Bartlet Jun 29 '18 at 14:51
In Python, a loop constructed as `for x in y:` works the same as saying `for each element which I call x in an iterable named y:`. In this case, doing `print(str(type(links)))` returns `` which is iterable. You can learn more about iterables [here](https://stackoverflow.com/questions/9884132/what-exactly-are-iterator-iterable-and-iteration). – BenG Jun 29 '18 at 15:13

Extracting something specific from BS4 after running a find_all

1 Answers1