Is there such thing as a "if x (or any variable) has any value" function in Python?

Question

I'm trying to build a web crawler that generates a text file for multiple different websites. After it crawls a website it is supposed to get all the links in a website. However, I have encountered a problem while web crawling Wikipedia. The python script gives me the error:

Traceback (most recent call last):
  File "/home/banana/Desktop/Search engine/data/crawler?.py", line 22, in <module>
    urlwaitinglist.write(link.get('href'))
TypeError: write() argument must be str, not None

I looked deeper into it by having it print the discovered links and it has "None" at the top. I'm wondering if there is a function to see if the variable has any value.

Here is the code I have written so far:

from bs4 import BeautifulSoup
import os
import requests
import random
import re

toscan = "https://en.wikipedia.org/wiki/Wikipedia:Contents"
url = toscan
source_code = requests.get(url)
plain_text = source_code.text

removal_list = ["http://", "https://", "/"]

for word in removal_list:
    toscan = toscan.replace(word, "")

soup = BeautifulSoup(plain_text, 'html.parser')
for link in soup.find_all('a'):
    print(link.get('href'))
    urlwaitinglist = open("/home/banana/Desktop/Search engine/data/toscan", "a")
    urlwaitinglist.write('\n')
    urlwaitinglist.write(link.get('href'))
    urlwaitinglist.close()
    
print(soup.get_text())

directory = "/home/banana/Desktop/Search engine/data/Crawled Data/"

results = soup.get_text()

results = results.strip()

f = open("/home/banana/Desktop/Search engine/data/Crawled Data/" + toscan + ".txt", "w")
f.write(url)
f.write('\n')
f.write(results)
f.close()

Does this answer your question? [How to check if a variable is empty in python?](https://stackoverflow.com/questions/10545385/how-to-check-if-a-variable-is-empty-in-python) — Gino Mempin, Dec 15 '22 at 02:39
Not every html tag has an href attribute. See https://stackoverflow.com/q/5292343/238704 — President James K. Polk, Dec 15 '22 at 02:41

score 1 · Accepted Answer · answered Dec 15 '22 at 03:09

Looks like not every <a> tag you are grabbing is returning a value. I would suggest making every link variable you grab a string and check if its not None. It is also bad practice to to open a file without using the 'with' clause. I have added an example that grabs every https|http link and writing it to file using some of your code below:

from bs4 import BeautifulSoup
import os
import requests
import random
import re

file_directory = './' # your specified directory location
filename = 'urls.txt' # your specified filename

url = "https://en.wikipedia.org/wiki/Wikipedia:Contents"
res = requests.get(url)
html = res.text
    
soup = BeautifulSoup(html, 'html.parser')
links = []

for link in soup.find_all('a'):
    link = link.get('href')
    print(link)
    match = re.search('^(http|https)://', str(link))
    if match:
        links.append(str(link))
    
    
with open(file_directory + filename, 'w') as file:
    for link in links:
        file.write(link + '\n')

Is there such thing as a "if x (or any variable) has any value" function in Python?

1 Answers1