0

I am trying in Python 3 to get a list of all substrings of a given String a, which start after a delimiter x and end right before a delimiter y. I have found solutions which only get me the first occurence, but the result needs to be a list of all occurences.

start = '>'
end = '</'
s = '<script>a=eval;b=alert;a(b(/XSS/.source));</script><script>a=eval;b=alert;a(b(/XSS/.source));</script>'"><marquee><h1>XSS by Xylitol</h1></marquee>'
print((s.split(start))[1].split(end)[0])

the above example is what I've got so far. But I am searching for a more elegant and stable way to get all the occurences.

So the expected return as list would contain the javascript code as following entries:

a=eval;b=alert;a(b(/XSS/.source));
a=eval;b=alert;a(b(/XSS/.source));
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
marcels93
  • 55
  • 9
  • Does this answer your question? [Parsing HTML using Python](https://stackoverflow.com/questions/11709079/parsing-html-using-python) – mkrieger1 Apr 22 '20 at 22:46
  • Sadly not... I am actually working with Beautiful Soup and Esprima. The input strings on the other hand dont necessary contain a full HTML Structure that could be parsed. They will rather be URL's which contain XSS Paylods and therefor can contain Javascript. I need to manually extract all tags out of the URL. – marcels93 Apr 22 '20 at 22:49

1 Answers1

1

Looking for patterns in strings seems like a decent job for regular expressions. This should return a list of anything between a pair of <script> and </script>:

import re
pattern = re.compile(r'<script>(.*?)</script>')
s = '<script>a=eval;b=alert;a(b(/XSS/.source));</script><script>a=eval;b=alert;a(b(/XSS/.source));</script>\'"><marquee><h1>XSS by Xylitol</h1></marquee>'
print(pattern.findall(s))

Result:

['a=eval;b=alert;a(b(/XSS/.source));', 'a=eval;b=alert;a(b(/XSS/.source));']
orKach
  • 111
  • 1
  • 7