How to remove \n and \r from a string

Question

I currently am trying to get the code from this website: http://netherkingdom.netai.net/pycake.html Then I have a python script parse out all code in html div tags, and finally write the text from between the div tags to a file. The problem is it adds a bunch of \r and \n to the file. How can I either avoid this or remove the \r and \n. Here is my code:

import urllib.request
from html.parser import HTMLParser
import re
page = urllib.request.urlopen('http://netherkingdom.netai.net/pycake.html')
t = page.read()
class MyHTMLParser(HTMLParser):
    def handle_data(self, data):
        print(data)
        f = open('/Users/austinhitt/Desktop/Test.py', 'r')
        t = f.read()
        f = open('/Users/austinhitt/Desktop/Test.py', 'w')
        f.write(t + '\n' + data)
        f.close()
parser = MyHTMLParser()
t = t.decode()
parser.feed(t)

And here is the resulting file it makes:

b'
import time as t\r\n
from os import path\r\n
import os\r\n
\r\n
\r\n
\r\n
\r\n
\r\n'

Preferably I would also like to have the beginning b' and last ' removed. I am using Python 3.5.1 on a Mac.

Just convert `t` to a string instead of a byte array; `t = str(page.read(), 'UTF-8')`. (optionally replacing UTF-8 with the encoding you want, of course) — Joachim Isaksson, Mar 06 '16 at 18:53
@JoachimIsaksson that seems to remove everything except the first line. — HittmanA, Mar 06 '16 at 18:56

cdarke · Accepted Answer · 2016-03-06T19:06:07.870

59

A simple solution is to strip trailing whitespace:

with open('gash.txt', 'r') as var:
    for line in var:
        line = line.rstrip()
        print(line)

The advantage of rstrip() over using a [:-2] slice is that this is safe for UNIX style files as well.

However, if you only want to get rid of \r and they might not be at the end-of-line, then str.replace() is your friend:

line = line.replace('\r', '')

If you have a byte object (that's the leading b') the you can convert it to a native Python 3 string using:

line = line.decode()

edited Mar 06 '16 at 19:06

answered Mar 06 '16 at 18:59

cdarke

42,728
8
80
84

1

This doesn't seem to work. I tried it and it doesn't change it. – HittmanA Mar 06 '16 at 19:05
There was a typo (OS X corrective text), `strip` should have been `rstrip`. – cdarke Mar 06 '16 at 19:06
It says that str object has no attribute decode. I do not convert the byte data to a string anywhere in the code so why do I get this error? – HittmanA Mar 07 '16 at 03:13
1

Also @cdarke I tried the replace code on \r and it doesn't work. All the \r's remain. I even tried it on other characters to ensure it works and it removed other characters it just won't remove \r or \n. – HittmanA Mar 07 '16 at 03:28
You should be using `decode()` on bytes objects, not string objects - you asked how to get rid of the `b'` - that indicates a bytes object. I don't understand why the `replace()` would not work, you are capturing the returned value I hope. Remember that none of the Python string methods alter the string, they all return a new string (because strings are immutable). – cdarke Mar 07 '16 at 07:26
oh, ok. Thanks! Let me fix that. – HittmanA Mar 07 '16 at 13:30
This saved my life. Thanks. – Abhimanyu Shekhawat Oct 27 '20 at 21:01

score 3 · Answer 2 · edited Nov 07 '21 at 02:27

3

to remove carriage return:

line = line.replace('\r', '')

to remove tab

line = line.replace('\t', '')

edited Nov 07 '21 at 02:27

wisbucky

33,218
10
150
101

answered Jul 08 '20 at 08:03

Vikram Mahapatra

101
1
2

How to remove \n and \r from a string

2 Answers2

Linked