Python scraper and saving data to file, I need it to include the
tag in the file

Question

Alright so this is my code for a webscraper I've build. Right now it scrapes everything that I've selected with soup. But when I view the source code of my page this data includes a   which is line break.

When I scrape and save everything to the file, this gets excluded which makes all the data in one line without the   tag. I want this   to be there after each data is written to the file as follows:

Data<br>Data<br>Data<br>Data<br>

And not:

DataDataDataDataData

Is there anyway to currently modify my code? I think it's the g = item.text.encode('utf-8') that makes it remove the  . I would be happy if I could include the   in the code because then I can just regex it.

    try :
                t_data = soup.find_all("div", {"class": "blockrow restore"})
                for item in t_data:
                    f = open('test.txt' , 'w')
                    g = item.text.encode('utf-8')
                    f.write(g)
                    f.close 


            finally:

Thanks.

Could you post an abbreviated sample of the HTML you're scraping, showing the relationship between the `div`s you're searching for and the
tags within them? — Jon Winsley, Nov 28 '16 at 19:39
In other news, it looks like your `for` loop might be overwriting "test.txt" on each iteration. You probably want to open it for [a]ppend instead of [w]rite. — Jon Winsley, Nov 28 '16 at 19:41
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data The output becomes: DataDataDataDataDataDataDataDataData instead of: Data
Data
Data
Data
Data — alexanderjoe, Nov 28 '16 at 19:43

score 0 · Answer 1 · answered Nov 28 '16 at 20:02

0

If you just want to capture the   newlines, you can just replace the   tag in the item with a new line character before parsing:

for br in item.find_all("br"):
    br.replace_with("\n")

If you actually want to preserve the internal HTML of the tag, you can just convert the BeautifulSoup item back to a string and print that:

g = unicode(item)

answered Nov 28 '16 at 20:02

Jon Winsley

106
5

Thank you very much I did the replace loop and it worked! Thanks! – alexanderjoe Nov 28 '16 at 20:13

Python scraper and saving data to file, I need it to include the tag in the file

1 Answers1

Python scraper and saving data to file, I need it to include the
tag in the file