-1

I am trying to write the output to a .txt file but it is only returning the last link that is scraped.

this is the code

from bs4 import BeautifulSoup
import requests


base_url = 'https://www.youtube.com/'
page = requests.get(base_url)
soup = BeautifulSoup(page.content, 'html.parser')


for tag in soup.find_all('a'):
    link = tag.get('href')

    if str(link).startswith("/"):
        print(base_url + link)

    if str(link).startswith("http"):
        print(link)

    f = open("queue.txt", "w")
    f.write(link)

this is the output to the txt file:

/news://www.youtube.com/howyoutubeworks?utm_campaign=ytgen&utm_source=ythp&utm_medium=LeftNav&utm_content=txt&u=https%3A%2F%2Fwww.youtube.com%2Fhowyoutubeworks%3Futm_source%3Dythp%26utm_medium%3DLeftNav%26utm_campaign%3Dytgen
Koomcravet
  • 23
  • 3
  • Does this answer your question? [Difference between modes a, a+, w, w+, and r+ in built-in open function?](https://stackoverflow.com/questions/1466000/difference-between-modes-a-a-w-w-and-r-in-built-in-open-function) – Pranav Hosangadi Nov 01 '20 at 19:35

1 Answers1

0

You are overwriting the file's content every time, so in the end only the last link appears. A possible solution could be to open the file only once, e.g. in the following way.

from bs4 import BeautifulSoup
import requests


base_url = 'https://www.youtube.com/'
page = requests.get(base_url)
soup = BeautifulSoup(page.content, 'html.parser')


with open("queue.txt", "w") as f:
    for tag in soup.find_all('a'):
        link = tag.get('href')

        if str(link).startswith("/"):
            print(base_url + link)

        if str(link).startswith("http"):
            print(link)
        
        f.write(link+"\n")

Another solution would of course be to just open the file in append mode with f = open("queue.txt", "a").

mlang
  • 728
  • 6
  • 15