0

So I'm trying to get a value from a object in html. I've found out how to get the value, but there's extra stuff being added to it that I don't want.

I've tried using .split() and groups but none of those have done anything.

html = r.text
checkouttoken = re.search('DF_CHECKOUT_TOKEN = (.*?);', html, re.S)

print(checkouttoken.group(0))

Expected:

27f37949bb8a76ede81508c8c1b750c8

Actual:

< iframe srcdoc="&lt;script&gt;!function(){var e=function(e){var t={exports:{}};return e.call(t.exports,t,t.exports),t.exports},r=function(){fun
DF_CHECKOUT_TOKEN = "27f37949bb8a76ede81508c8c1b750c8";
Emma
  • 27,428
  • 11
  • 44
  • 69
cabatchi
  • 63
  • 1
  • 7

2 Answers2

1

Do group(1). group(0) is all of the matched text, group(1) is the first group that you captured.

Also, if you don't want the quotations in the result, you will need to add the quotations to the regex, outside of the capture group: 'DF_CHECKOUT_TOKEN = "(.*?)";'

mapeters
  • 1,067
  • 7
  • 11
1

The expression we might want here can be as simple as:

DF_CHECKOUT_TOKEN = \"(.+?)\"

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"DF_CHECKOUT_TOKEN = \"(.+?)\""

test_str = "< iframe srcdoc=\"<script>!function(){var e=function(e){var t={exports:{}};return e.call(t.exports,t,t.exports),t.exports},r=function(){fun DF_CHECKOUT_TOKEN = \"27f37949bb8a76ede81508c8c1b750c8\";"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Demo

Emma
  • 27,428
  • 11
  • 44
  • 69