-1

I am getting HTML data with a python get( url ) command which returns raw HTML data that contains “\n” characters. When I run the replace (“\n”,””) command against this it does not remove it. Could some explain how to either remove this at the "simple_get" stage or from the "raw_htmlB" stage! Code below.

from CodeB import simple_get

htmlPath = "https://en.wikipedia.org/wiki/Terminalia_nigrovenulosa"        
raw_html = simple_get(htmlPath)
if raw_html is None:
    print("not found")
else:
    tmpHtml = str(raw_html)
    tmpHtmlB = tmpHtml.replace("\n","")    
    print("tmpHtmlB:=", tmpHtmlB)


from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup

def simple_get(url):
    try:
        with closing(get(url, stream=True)) as resp:
            if is_good_response(resp):
                return resp.content
            else:
                return None
    except RequestException as e:
        log_error('Error during requests to {0} : {1}'.format(url, str(e)))
        return None

def is_good_response(resp):
    content_type = resp.headers['Content-Type'].lower()
    return (resp.status_code == 200 
        and content_type is not None 
        and content_type.find('html') > -1)

def log_error(e):
    print(e)
Shaun
  • 1
  • Python String literals support backslash escaped chars. Many answers already on SO, such as https://stackoverflow.com/a/4369166/1531971 –  Sep 18 '18 at 17:45
  • Thanks for all the replies, as you may have guessed I am new to the wacky world of python and this question had been driving me up the wall and in the end it turns out to be so simple an answer. thanks again.. – Shaun Sep 19 '18 at 09:49

4 Answers4

0

I think a simple adding of space between your double quotes should do you good

shivam thakur
  • 109
  • 10
0

Use raw strings r'\n or remember that \n stands for newline and you need to escape the backslash: .replace('\\n', '')

Huang_d
  • 144
  • 8
0

I believe you need to add a another backlash "\" to \n in order to search for the literal string \n, and escape the backlash.

Quick example:

string = '\\n foo'
print(string.replace('\n', ''))

Returns:

\n foo

While:

print(string.replace('\n', ''))

Returns just:

 foo
Márcio Coelho
  • 333
  • 3
  • 11
0

It should be pretty straight-forward, Use rstrip to chop off the \n char from the tmpHtmlB.

>>> tmpHtmlB = "my string\n"
>>> tmpHtmlB.rstrip()
'my string'

In your case it should be :

tmpHtmlB = tmpHtml.rstrip()

Even if you have multiple newline chars there, you can use as follows because The canonical way to strip end-of-line (EOL) characters is to use the string rstrip() method removing any trailing \r or \n.

\r\n - on a windows computer
\r - on an Apple computer
\n - on Linux

>>> tmpHtmlB = "Test String\n\n\n"
>>> tmpHtmlB.rstrip("\r\n")
'Test String'

OR

>>> tmpHtmlB.rstrip("\n")
'Test String'
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53