This seems to work fine here. The \n
and \t
are literal characters in the pastebin you provided, so the backslashes need to be escaped.
import re
x = open('data.html').read()
m = re.findall(
r'\\n\\t\\t\\t\\t\\t(.*)\\n\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t</a>',
x,
re.MULTILINE)
print(m)
And, as suggested by Jeff Mandell you can shorten the regex by:
\\n(\\t){5}(.*)\\n(\\t){5}\\n(\\t){5}</a>
So, this means that if you have a file containing actual newlines, a regex r'\n'
will match those.
v = '\n'
print(v) # prints a blank line
print(len(v)) # outputs 1
m = re.match(r'\n', v)
print(m) # match
m = re.match(r'\\n', v)
print(m) # no match
v = '\\n' # which would appear as \n in your text editor
print(v) # prints the two characters \ and n
print(len(v)) # outputs 2
m = re.match(r'\n', v)
print(m) # no match
m = re.match(r'\\n', v)
print(m) # match