Alright so this is my code for a webscraper I've build. Right now it scrapes everything that I've selected with soup. But when I view the source code of my page this data includes a <br>
which is line break.
When I scrape and save everything to the file, this gets excluded which makes all the data in one line without the <br>
tag. I want this <br>
to be there after each data is written to the file as follows:
Data<br>Data<br>Data<br>Data<br>
And not:
DataDataDataDataData
Is there anyway to currently modify my code? I think it's the g = item.text.encode('utf-8')
that makes it remove the <br>
. I would be happy if I could include the <br>
in the code because then I can just regex it.
try :
t_data = soup.find_all("div", {"class": "blockrow restore"})
for item in t_data:
f = open('test.txt' , 'w')
g = item.text.encode('utf-8')
f.write(g)
f.close
finally:
Thanks.
tags within them? – Jon Winsley Nov 28 '16 at 19:39
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data The output becomes: DataDataDataDataDataDataDataDataData instead of: Data
Data
Data
Data
Data
– alexanderjoe Nov 28 '16 at 19:43