How do I write the output of gathering links via BeautifulSoup to a .txt file

Question

I am trying to write the output to a .txt file but it is only returning the last link that is scraped.

this is the code

from bs4 import BeautifulSoup
import requests


base_url = 'https://www.youtube.com/'
page = requests.get(base_url)
soup = BeautifulSoup(page.content, 'html.parser')


for tag in soup.find_all('a'):
    link = tag.get('href')

    if str(link).startswith("/"):
        print(base_url + link)

    if str(link).startswith("http"):
        print(link)

    f = open("queue.txt", "w")
    f.write(link)

this is the output to the txt file:

/news://www.youtube.com/howyoutubeworks?utm_campaign=ytgen&utm_source=ythp&utm_medium=LeftNav&utm_content=txt&u=https%3A%2F%2Fwww.youtube.com%2Fhowyoutubeworks%3Futm_source%3Dythp%26utm_medium%3DLeftNav%26utm_campaign%3Dytgen

Does this answer your question? [Difference between modes a, a+, w, w+, and r+ in built-in open function?](https://stackoverflow.com/questions/1466000/difference-between-modes-a-a-w-w-and-r-in-built-in-open-function) — Pranav Hosangadi, Nov 01 '20 at 19:35

mlang · Answer 1 · 2020-11-01T19:37:34.070

You are overwriting the file's content every time, so in the end only the last link appears. A possible solution could be to open the file only once, e.g. in the following way.

from bs4 import BeautifulSoup
import requests


base_url = 'https://www.youtube.com/'
page = requests.get(base_url)
soup = BeautifulSoup(page.content, 'html.parser')


with open("queue.txt", "w") as f:
    for tag in soup.find_all('a'):
        link = tag.get('href')

        if str(link).startswith("/"):
            print(base_url + link)

        if str(link).startswith("http"):
            print(link)
        
        f.write(link+"\n")

Another solution would of course be to just open the file in append mode with f = open("queue.txt", "a").

How do I write the output of gathering links via BeautifulSoup to a .txt file

1 Answers1