1

I have a list of multiple URLs for use with a Kimono desktop API I created, but for the life of me I can't figure out how to make it clear in the data output (csv) what rows of results come from which source URL.

Is there a way to pull in the source URL as another column to easily distinguish rows of data when there are 100+ URLs? Thanks!

saijay
  • 11
  • 2

1 Answers1

0

It's based off the html and css within the generated source code, so unless you have a dependable value to use within the source that explicitly state the url (such as wikipedias link canonical tags), then you are left with using the scrape index values.

If a scrape is unsuccessful for one page, it won't skip it, it'll still create a row with an index number. It also will be in order of entered page values, so if you're using a predetermined list of urls, you can just have the url list numbered yourself and then correlate the two indexes together like id's.

Otherwise, use a value on the page that you already know in order to confirm the relevant content, such as an ID number, product number or any other data.

Bradley
  • 21
  • 1