Using the following code, I end up with one or more newlines between each and every line in my file when running the code on windows (in jupyter notebook on python3) but NOT when running on mac or Linux?
I assume it's some kind of encoding issue? something to do with window's "/r/n
" shenanigans? doing a ;str(page.content)instead leaves me with a file full of
/r/n` as expected but I'm not sure why it's chalk full of newlines to begin with?
note: I have commented out a quick way to remove whitespace but it's a bit of a hack and not really what I'm after, i'm more looking for why the whitespace is being added to begin with.
import requests
url = 'https://stackoverflow.com/questions/3030487/is-there-a-way-to-get-the-xpath-in-google-chrome'
page=requests.get(url)
newhtml = page.text
# import re
# newhtml = re.sub(r'\s\s+', ' ', page.text)
f = open('webpage.html', 'w', encoding='utf-8')
f.write(newhtml)
f.close()
Result Sample:
<html itemscope itemtype="http://schema.org/QAPage" class="html__responsive">
<head>
<title>Is there a way to get the xpath in google chrome? - Stack Overflow</title>
<link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=4f32ecc8f43d">
<link rel="apple-touch-icon image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a">
<link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href="/opensearch.xml">
<meta name="viewport" content="width=device-width, height=device-height, initial-scale=1.0, minimum-scale=1.0">
<meta property="og:type" content= "website" />
<meta property="og:url" content="https://stackoverflow.com/questions/3030487/is-there-a-way-to-get-the-xpath-in-google-chrome"/>
<meta property="og:site_name" content="Stack Overflow" />