1

I am trying to parse the following website in order to get all addresses of stores (sorry for my Russian):
http://magnit-info.ru/buyers/adds/1258/14/243795

Here are addresses just for one city at the end of the page. The addresses are placed in the block .b-shops-list. This block is populated dynamically by POST request. When I tried to use requests module and get addresses, it does not work since the block is empty at the beginning (page source).

I am using Selenium right now, but it is really slow. To parse all cities and regions it takes about 2 hours (even with multiprocessing). I also have to use expected_conditions and wait about 4-5 seconds to be sure that POST requests are completed.

Are there any options to accelerate this process? Can I send POST requests somehow by using requests? If yes, how I figure out what kind of POST requests I should sent? This question is also related to websites which use Google maps.

Thank you!

Trarbish
  • 363
  • 4
  • 16

1 Answers1

2

I had a look at the AJAX request that this pages does to load the addresses and came up with this small code snippet:

import requests

data = {
    'op': 'get_shops',
    'SECTION_ID': 1258,
    'RID': 14,
    'CID': 243795,
}

res = requests.post('http://magnit-info.ru/functions/bmap/func.php', data=data)
addresses = res.json()

If you check the data dictionary you can clearly see that you could easily generate it from the URL you linked.

Paco H.
  • 2,034
  • 7
  • 18