I have been trying to learn regex and once again I got stuck.
What I am trying to scrape is a value of:
var preloadedItems = [
{
"id": "8971",
"permalink": "https://www.randomsite1.com"
},
{
"id": "8943",
"permalink": "https://www.randomsit2e.com"
},
{
"id": "8944",
"permalink": "https://www.randoms3ite.com"
},
{
"id": "8950",
"permalink": "https://www.random4site.com"
},
{
"id": "8910",
"permalink": "https://www.random5site.com"
},
{
"id": "8915",
"permalink": "https://www.rando6msite.com"
}
];
#The code is pretty long so I have not posted everything here.
which I get by doing
p = re.compile(r'var preloadedItems = \[(.*?)\];', re.DOTALL)
data = p.findall(req.text)[0]
which returns me the whole value of the json I posted. However I want to scrape only all permalink into a list and I tried to do
p = re.compile(r'var preloadedItems = \[(.*?)\];', re.DOTALL)
data = json.loads(p.findall(r.text)[0]).items()
but I do get an error of Extra data: line 1 column 2657 (char 2656)
and I wonder how I am able to scrape all permalinks into a list?
Update:
My thought was to scrape the json value first using regex to be able to use it later on as json.loads(regexValue)
- Meaning thaht I use regex to grab the value Regexjson = {....} and after that using json.loads(Regexjson)...