1

So with following im adding Script from a webpage to a dict and afterwards trying to find a certain string in it, then get the 13 letters after:

for link in productlinks:
    try:
        s = HTMLSession()
        r = s.get(link)
        response = urllib3.PoolManager().request(
            "GET", link, headers={'User-Agent': "python"})
        soup = BeautifulSoup(response.data.decode('utf-8'), 'html.parser')
        title = r.html.find('h1.rd-title', first=True).text
        script2 = []
        script1 = soup.findAll("script")[2]
        script2.append(script1)
        special_string = '"ean",values:[{text:"'
        x_letters_afterwards = 13
        result = re.findall(re.escape(special_string) + ".{" + x_letters_afterwards + "}", script2)
        print(result)
    except:
        print("...")

The problem is the for loop try seems to break to an except through something as it always just prints "..." instead of the string i try to extract (or something else in general).

An example of the output where the string should be found: https://pastebin.com/xvzQ456P

I dont know what to do...

1 Answers1

0

Your script2 is a list of objects. You need to get the string and pass it to the re method.

However, you are using re.findall that fetches a list of matches. You need to use re.search to get the first match only:

special_string = '"ean",values:[{text:"'
x_letters_afterwards = 13
match = re.search(re.escape(special_string) + "(.{" + x_letters_afterwards + "})", ' '.join([x for x in script1]))
result = ''
if match:
    result = match.group(1)

Notes:

  • re.escape(special_string) - re.escape appends a \ escape before each special regex char in the special_string string that is used literally in the regex
  • "(.{" + x_letters_afterwards + "})" will form a capturing group (with ID = 1) that will ook like (.{13}) and will capture any 13 chars other than line break chars into Group 1
  • You need to check if there is a match first, before accessing the group value, hence, if match: check
  • Once the check if positive, there is a match, the Group 1 value from match.group(1) is assigned to result.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563