When extracting data over multiple pages, each page's results are placed in a sperate column. Saving as CSV/Excel for example it will look like this:
urls urls urls urls page2_urls page2_urls page2_urls page3_urls page3_urls page4_urls
and so one.
Saving as JSON, I get weird results also - this is not the full json result so it won't render properly in a json viewer, not that the full result did any way.
{
"FibberShows": [
{
"mp3URL": "url"
},
],
"pages": [
{
},
{
},
{
},
{
},
{
"FibberShows": [
{
"mp3URL": "url"
},
],
"pages": [
{
},
{
},
{
},
{
},
{
},
{
"FibberShows": [
{
"mp3URL": "url"
}
],
"pages": [
{
},
{
},
{
},
{
},
{
},
{
"FibberShows": [
{
"mp3URL": "url"
},
pattern repeats for the remaining pages.
My parsehub extraction commands:
Above commands reside in template FibberShows so after clicking next page it reloads the template to extract the next pages URL.
Above extracts the URL's just fine, i'm just not happy with how it formats the results. on 5 pages it is an easy fix.
On 20+ pages, it takes some time to manually go through and delete empty columns to get all the mp3 urls to show up in one long list.
Ultimate goal is a single long list of URL's:
url url url url url etc...
Any advice?