I am NOT new to general programming, but I am new to larger programs requiring creating/importing my own modules. I've done it in c language before, but it was years ago.. and this is Python.
I'm looking for guidance on organizing. I finally figured out HOW to import a .PY file into my project (and what it should maybe look like) as well as adding paths to windows variables, but now I am curious about if I'm doing things 'correct' or what is the best practice. Below, I included a list of links I've already read that didn't answer my questions, but figured I would try to make this thread a one-stop-shop as I've seen this is a little bit of a hot topic over the years.
I'm trying to make a kindof all-in-one module full of functions for scraping so I can do like I did in the test file and just simply write ONE line to do what I need. I.e. pass in a URL and get back a sorted list of all HTML tags and their frequency in the page. (This is just something to experiment while trying to learn organization and external files) It's a pain, because if something goes wrong, I have to change all kinds of files.
I'm getting errors like: "request = scraper_tools.get_request(url, data=None, headers=scraper.reg_header)NameError: name 'scraper' is not defined"
Am I doing it wrong, and is there a better way? (I assume there is) :)
My code goes like this:
scraper_tools.py
#!my_modules/python
# Filename: scraper_tools.py
import requests import bs4 as bs
phone_header = {'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_2 like mac OS X)'} reg_header = {'user-agent': 'Mozilla/5.0 (Windows NT
6.1; rv:52.0) Gecko/20100101 Firefox/52.0'}
def make_soup(request, parser):
# Make soup
return bs.BeautifulSoup(request.text, parser)
def get_request (url, data, headers, **kwargs):
if kwargs and not headers:
try:
return requests.get(url, data=data)
except Exception as e:
print(e)
elif headers and kwargs:
return requests.get(url, headers=headers, data=data)
def get_all_items(soup, tag):
return soup.find_all(tag)
def open_file_write(path, filename):
save_path = path
return open(os.path.join(save_path, filename), 'w')
def get_all_links(self, soup):
href_tags = soup.find_all(href=True)
link_list = []
for tag in href_tags:
if 'http' in tag['href'][0:4]:
link_list.append(tag['href'])
return link_list
get_all_tags.py
from stevens_tools import scraper_tools
import operator
import requests
'''
Author: Steven Smith
Email: StevenSmithCIS@gmail.com
Date: 9/1/2017
Description: This file uses an online resource website to dynamically get all
common HTML tags in a list to be used to count list elements inside a specific
web page (and therefore know something about the quantity of each particular tag).
'''
html_tag_website_url = 'https://www.quackit.com/html/tags'
soup = None
tag_qnty_dict = None
request_object = None
def get_html_tags():
#Get all the HTML tags currently from the website
all_tags = []
ul_lists = soup.find_all('ul', {'class': 'col-3 taglist'})
for li in ul_lists:
for item in li.find_all('a'):
all_tags.append(item.text)
return all_tags
def get_all_tags_from(url):
#Returns a dictionary of all tags from HTML tag website in passsed in URL
#with tag and quanity listed
request = scraper_tools.get_request(url, data=None, headers=scraper_tools.reg_header)
soup = scraper_tools.make_soup(request, 'lxml')
tag_qnty_dict = {}
tags = get_html_tags_from_file()
if tags:
for tag in tags:
# If there is more than 0 items, add to list
item_qnty = len(scraper_tools.get_all_items(soup, tag))
if item_qnty > 0:
tag_qnty_dict.update({tag: item_qnty})
return tag_qnty_dict
def sort_items(reverse):
#Sorts items in tag dictionary by quantity. In reverse (largest first)
# if reverse is True
return sorted(tag_qnty_dict.items(), key=operator.itemgetter(1), reverse=reverse)
def print_all():
for item in sort_items(True):
print('Tag = ' + item[0] + " Quantity: = " + str(item[1]))
test_tag_counter.py
from stevens_tools import get_all_tags
get_all_tags.print_all(get_all_tags.get_all_tags_from('https://www.goodreads.com/list/tag/best'))
^^^^^^^^^^^^^^^^^^ Not too crazy about those names, but.. they're descriptive! lol
**Other topics I've visited
Python Packages and Modules (..new to importing modules/packages in Python) http://mikegrouchy.com/blog/2012/05/be-pythonic-init__py.html (using __init.py for module/package identifier) create Python package and import modules (import each file vs one time) Why installing package and module not same in Python? (import version problem -Python 3.4 vs 2x) What's the difference between a Python module and a Python package? (<-- see name lol) What's the difference between "package" and "module" (<-- see name) Remove package and module name from sphinx function (removing module name) importing package and modules from another directory in python(<-- using sys) Best practices when importing in IPython http://docs.python-guide.org/en/latest/writing/structure/#modules