I have pulled data from API, I'm am looping through everything and find a key: value that has a URL. So I am creating a separate list of the URLs, what I need to do is follow the link and grab the contents from the page, pull the contents of that page back in to the array/list (it will just be a paragraph of text) and of course loop through the remaining URL. Do I need to use Selenium or BS4 and how do I loop through and pulling the page contents into my array/list?
json looks like this:
{
"merchandiseData": [
{
"clientID": 3003,
"name": "Yasir Carter",
"phone": "(758) 564-5345",
"email": "leo.vivamus@pedenec.net",
"address": "P.O. Box 881, 2723 Elementum, St.",
"postalZip": "DX2I 2LD",
"numberrange": 10,
"name1": "Harlan Mccarty",
"constant": ".com",
"text": "deserunt",
"url": "https://www.deserunt.com",
"text": "https://www."
},
]
}
Code thus far:
import requests
import json
import pandas as pd
import sqlalchemy as sq
import time
from datetime import datetime, timedelta
from flatten_json import flatten# read file
with open('_files/TestFile2.json', 'r') as f:
file_contents = json.load(f)
allThis = []
for x in file_contents['merchandiseData']:
holdAllThis = {
'client_id' : x['clientID'],
'client_description_link' : x['url']
}
allThis.append(holdAllThis)
print(client_id, client_description_link)
print(allThis)