Visible and search URLs for webscraping

Question

When I try to apply filters on the website before webscaping - it yields me to the following URL - https://www.marktplaats.nl/l/auto-s/p/2/#f:10898,10882

However, when I apply it in my script to retrieve href for each and every advertisement, it yields results from this website - https://www.marktplaats.nl/l/auto-s/p/2, completely neglecting 2 of my filters (namely #f:10898,10882).

Can you please advise me what is my problem?

import requests
import bs4
import pandas as pd

frames = []

for pagenumber in range (0,2):
        url = 'https://www.marktplaats.nl/l/auto-s/p/'
        add_url='/#f:10898,10882'
        txt = requests.get(url + str(pagenumber)+add_url)
        soup = bs4.BeautifulSoup(txt.text, 'html.parser')
        soup_table = soup.find('ul', 'mp-Listings mp-Listings--list-view')

        for car in soup_table.findAll('li'):

            link = car.find('a')
            sub_url = 'https://www.marktplaats.nl/' + link.get('href')

            sub_soup = requests.get(sub_url)
            soup1 = bs4.BeautifulSoup(sub_soup.text, 'html.parser')

Does this answer your question? [Can I read the hash portion of the URL on my server-side application (PHP, Ruby, Python, etc.)?](https://stackoverflow.com/questions/940905/can-i-read-the-hash-portion-of-the-url-on-my-server-side-application-php-ruby) — αԋɱҽԃ αмєяιcαη, May 14 '20 at 11:40

motionsickness · Answer 1 · 2020-05-14T15:05:48.140

2

I would suggest that you use their api instead which seems to be open.

If you open the link you will see all the same listings you are searching for (try to find something to format the json, since it will look like just a bunch a text), with the appropriate filters and no need to parse html. You can also modify it easily in request just by changing the headers.

https://www.marktplaats.nl/lrp/api/search?attributesById[]=10898&attributesById[]=10882&l1CategoryId=91&limit=30&offset=0

In code it would look something like this:

def getcars():
    url = 'https://www.marktplaats.nl/lrp/api/search'

    querystring = {
        'attributesById[]': 10898,
        'attributesById[]': 10882,
        'l1CategoryId': 91,
        'limit': 30,
        'offset': 0
        }

    headers = {
        }

    response = requests.get(url, headers=headers, params=querystring)
    x = response.json()
    return x

cars = getcars()

edited May 14 '20 at 15:05

answered May 14 '20 at 11:59

motionsickness

157
1
2
10

Hi motionsickness. Thank you for your feedback. However, I have never encountered API and how to work with it. I have now a question- How to interate and extract all links to the detailed advertisements("vipUrl":"/a/auto-s/opel...) as I want detailed linkes of all detailed advertisements in the list. – Iva May 14 '20 at 18:14
2

@Iva that function will return you a python dictionary. I would recommend that you search how to navigate those and how to work with json but I'll you with a quick example. The site's api will return you a dictionary that has a key called `listings`. Inside it are the car listings with details. To see them you could print `cars["listings"]`. This will be a `list` so to see the first listing you can print `cars["listings"][0]`. Now, each listings is another dictionary, wanna see the title? `cars["listings"][0]["title"]`. Price? `cars["listings"][0]["priceInfo"]`. and so on. – motionsickness May 15 '20 at 00:44
@motionsickness hi, please help me, how can I use this code, then i looking only auto/model/year? I searched a lot. I use BeautifulSoup. – Tornike Kharitonishvili Jul 05 '23 at 06:45

Visible and search URLs for webscraping

1 Answers1