0

Trying to automate scaping by finding exact string matches

Trying to scrape image links and their unique product numbers called "sku's" on jewellery websites hosted on platforms such as shopify, woocommerce and magento. For each jewellery website when trying to webscape the class names of the div tags change, but the links and the sku. How the string starts and ends is almost the same. so I have to find a match in the string of the whole HTML document and find the index postion and bascially move some position after the mentioned index position and grab the string. I'm facing problem in matching the exact string in the whole HYML webpage and match a string and get the respective index positon Exactly.

This is the string in have to find the match for see the first set of characters and last 3 characters are the same so i need to find a match for the same and trying to extract this link "//cdn.shopify.com/s/files/1/2237/1833/products/16_b903a3fc-5529-4937-91ef-98568f965182_490x@3x"

Need to find the above string match from the below set of code in the html webpage

Or is there any way to automatically download all the images and their sku's on the forementioned jewellery websites? If so, do let me know!

  • Please visit [help], take [tour] to see what and [ask]. Do some research, [search for related topics on SO](https://www.google.com/search?q=python+match+string+webpage+site:stackoverflow.com); if you get stuck, post a [mcve] of your attempt, noting input and expected output, – mplungjan Dec 18 '21 at 12:56
  • `find the match for see the first set of characters and last 3 characters are the same` ??? In the string you posted? – mplungjan Dec 18 '21 at 12:58
  • Please provide enough code so others can better understand or reproduce the problem. – Community Dec 25 '21 at 01:16

1 Answers1

0

You can definitely write a script that automatically downloads images from a source (if you have the permission to do so).

Such a script might include a regular expression to perform the string matching that you're looking for. Maybe this Stack Overflow answer will help: RegEx to select everything between two characters?

Daniel
  • 99
  • 1
  • 5
  • I'm running into issues using regex We use regex for exact pattern matching But for each images the pattern varies and the length of the link also varies So we can't possibly match an image link by finding a pattern, cause there is not pattern for these links. The image link strings for some are more and for some are less – lokesh_crucifix Dec 20 '21 at 08:06