Scraping data with multiple same class name using BeautifulSoup

Question

I'm practicing scraping using a real-estate website, and I want to scrape all addresses for recent sales. For example, the part of the website HTML looks like this: url = https://www.compass.com/agents/irene-vuong/

<div class="profile-active-listings" role="tabpanel" id="active-listings-sales">
    <div class="card-content">
      <a class="card-title" href="/listing" data-tn="label-address"> 111 East 35th </a>
                                            ........
<div class="textIntent-headline1"> Recent Sales</div>
    <div class="card-content">
      <a class="card-title" href="/morelisting" data-tn="label-address"> East 4th </a>

And I'm trying to get access to all address, using below code:

for i in range(0, 30):
    h = soup.findAll('a', {'class':'card-title'})[i]
    print(h)

However, I get an error of:

IndexError: list index out of range

I get the first few addresses, but only right before "Recent Sales". It's only getting addresses on the first part but not the entire website. How do I get all addresses?

It looks like you might be using the wrong `class`. There are currently 12 items on that page with the class `uc-listingCart-title`, not `card-title`. If you loop through those as suggested by @user2263572 (as opposed to hard-coding the `30`), that should give you all the items you're looking for. — Zachary Blackwood, Mar 03 '20 at 22:03
@ZacharyBlackwood Hi, I tried the suggestion but it still only gets part of it and not all.... :-( — Sarah, Mar 04 '20 at 15:11
Ah. Looks like the extra items are being added dynamically on the front-end. This answer might be helpful for getting the page contents after javascript has added the items. https://stackoverflow.com/a/26440563/5031672 — Zachary Blackwood, Mar 05 '20 at 15:41

score 0 · Answer 1 · answered Mar 03 '20 at 22:00

0

The findAll method returns a list of all elements that match your search criteria.

In your case, it returns a list of length 2.

you are then iterating through 0-29 and looking for those indexes on your list of length2.

Hence your error.

Your code should read something more like:

for x in soup.findAll('a', {'class':'card-title'}):
  print(x)

answered Mar 03 '20 at 22:00

user2263572

5,435
5
35
57

Above response answers your original question. This answers your current questions. https://stackoverflow.com/questions/16322862/beautiful-soup-findall-doesnt-find-them-all – user2263572 Mar 04 '20 at 16:04
Hi I'm not sure how the post is relevant to my questions... I'm using 'html.parser' in my code. Can you please explain? It will be greatly appreciated. – Sarah Mar 04 '20 at 18:29
Your question "Scraping data with multiple same class name using BeautifulSoup" and your issue is "findall not finding everything it should". I linked the question "Beautiful Soup findAll doesn't find them all". I think it's relevant. Either the link explains your issue, or you aren't using correct css selectors. – user2263572 Mar 04 '20 at 18:56
Unfortunately, I don't think the post is helpful :-( I think my problem is different than the one in the post. – Sarah Mar 04 '20 at 18:58

Scraping data with multiple same class name using BeautifulSoup

1 Answers1

Linked