I want to read in the content of a webpage as a string and remove all the linebreaks. To make my script platform independent, I thought it'd be a good idea to look for os.linesep instead of '\n' or "\n\r". To repleace the unwanted characters with other characters, I use string.replace. It did not work with a webpage so, I used a txt-file for testing. The content of the file is straightforward:
This is line one
this is line two
why does linsep not work?
I don't get it!
So strangly, when I read in the file as binary stream and the decode it, it does find all the linebreaks. When I read in as text, it does not. I checked both, the assumed string and the assumed string converted from a binary stream if they're really strings, which both appear to be, according to type(). This really bugs me, can someone please give me an explanation of what I'm misunderstanding here?
Here's my test code:
file = open(r"C:\Users\path\LinebreakTest.txt", "r")
data = file.read().replace(os.linesep, "REPLACEMENT")
print(type(data))
print(data)
file = open(r"C:\Users\path\LinebreakTest.txt", "rb")
dataBin = file.read().decode("utf-8").replace("\n", "REPLACEMENT")
print(type(dataBin))
print(dataBin)
This is my output:
class 'str'
This is line one
this is line two
why does linsep not work?
I don't get it!
class 'str'
This is line one
REPLACEMENTthis is line two
REPLACEMENTwhy does linsep not work?
REPLACEMENTI don't get it!
REPLACEMENT
Thanks in advance!