Strip \r\n from python variable

Question

I managed to get the page source DOM of an external website, but it came with \r\n and lots of whitespace.

import urllib.request

request = urllib.request.Request('http://example.com')
response = urllib.request.urlopen(request)
page = response.read()
page = page.strip('\r\n')
print (page)

I tried stripping them, but no luck. How can I get just the HTML?

And secondly, what is the logic for manipulating the returned DOM with javascript/jquery? I was hoping to do something like:

alert(document.getElementsByTagName('h1')[0].innerHTML);

Which should alert "Example Domain" with the generated DOM.

Not sure if you're aware of this or not, but `strip` removes characters only from the beginning or the end of a string. For example, `"\na\nb\n".strip("\n")` returns `'a\nb'`. — Kevin, Nov 06 '14 at 19:39
possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — ivan_pozdeev, Nov 06 '14 at 20:27

score 2 · Answer 1 · answered Nov 06 '14 at 19:41

2

'foo \r\n bar\r\n'.strip()

will only remove the '\r\n' at the end. If you have these throughout your text, try chaining .replace() like this:

'foo \r\n bar\r\n'.replace('\r', '').replace('\n', '').replace(' ', '')

answered Nov 06 '14 at 19:41

vikramls

1,802
1
11
15

Strip \r\n from python variable

1 Answers1