-1

I've created a script in python to get different lins from a webpage currently stored in my links variable as json. I can't process further to extract all the links available there.

This is my try:

import json
import requests
from bs4 import BeautifulSoup

url = 'https://www.afterpay.com/en-AU/categories'

r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
item = soup.select_one("[data-react-class='SharedStateHydrator']")
categories = json.loads(item.get("data-react-props"))['categoriesResponse']['data']
for linklist in categories:
    links = linklist['relationships']
    print(links)

Output of an individual block out of several:

{'stores': {'links': {'related': 'https://store-directory-api.afterpay.com/api/v1/categories/jewellery/stores?locale=en-AU'}}, 'topStores': {'links': {'related': 'https://store-directory-api.afterpay.com/api/v1/categories/jewellery/stores?locale=en-AU'}}, 'featuredStores': {'links': {'related': 'https://store-directory-api.afterpay.com/api/v1/categories/jewellery/stores?featured=true&locale=en-AU'}}, 'children': {'data': [{'type': 'categories', 'id': '135'}, {'type': 'categories', 'id': '326'}, {'type': 'categories', 'id': '38'}]}}

All the links connected to related keys.

How can I fetch all the links?

MITHU
  • 113
  • 3
  • 12
  • 41
  • Possible duplicate of [Iterate over nested dictionary](https://stackoverflow.com/questions/8335096/iterate-over-nested-dictionary) – Markus Jun 14 '19 at 11:23
  • Take a look at how to loop through nested dicts like here: https://stackoverflow.com/questions/8335096/iterate-over-nested-dictionary or: https://stackoverflow.com/questions/10756427/loop-through-all-nested-dictionary-values – Markus Jun 14 '19 at 11:24

3 Answers3

1

Try this:

import json
import requests
from bs4 import BeautifulSoup

url = 'https://www.afterpay.com/en-AU/categories'
r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
item = soup.select_one("[data-react-class='SharedStateHydrator']")
categories = json.loads(item.get("data-react-props"))['categoriesResponse']['data']

json_data = []

for linklist in categories:
    links = linklist['relationships']
    #iterate all related url
    for sub_dict in links:
        if "children" == sub_dict:
            continue

        # fetch all related url
        related_url = links[sub_dict]['links']['related']

        # fetch all related ulr json response
        links[sub_dict]['links']['response_data'] = requests.get(related_url).json()

    json_data.append(links)

print(json_data)
bharatk
  • 4,202
  • 5
  • 16
  • 30
  • You got me wrong ,dear. Just check out the json i've pasted above. You got the links which are only within `stores`. The are other categories, as in `topStores` `featuredStores` etc. However, all the categories have this `related` keys in common. I want to get all the links out of all the categories. Thanks. – MITHU Jun 14 '19 at 11:16
  • @MITHU Answer Updated. – bharatk Jun 14 '19 at 11:27
1

just iterate over dictionary

import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.afterpay.com/en-AU/categories'
r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
item = soup.select_one("[data-react-class='SharedStateHydrator']")
categories = json.loads(item.get("data-react-props"))['categoriesResponse']['data']
for linklist in categories:
    links = linklist['relationships']
    for key,related in links.items():
        if 'links' in related.keys():
            for key2,link in related.get('links').items():
                print(link)
Pavan Kumar T S
  • 1,539
  • 2
  • 17
  • 26
1

The following is quick (though worth confirming it is the required list)

import re, requests

r = requests.get('https://www.afterpay.com/en-AU/categories')
p = re.compile(r"related":"(.*?)&")
links = p.findall(r.text)
QHarr
  • 83,427
  • 12
  • 54
  • 101