It's based off the html and css within the generated source code, so unless you have a dependable value to use within the source that explicitly state the url (such as wikipedias link canonical tags), then you are left with using the scrape index values.
If a scrape is unsuccessful for one page, it won't skip it, it'll still create a row with an index number. It also will be in order of entered page values, so if you're using a predetermined list of urls, you can just have the url list numbered yourself and then correlate the two indexes together like id's.
Otherwise, use a value on the page that you already know in order to confirm the relevant content, such as an ID number, product number or any other data.