0

I have a python file in my Django project which scrapes 10 names from a website. I want to store these 10 names in a postgresql database.

Below is the python file.

import requests
import urllib3
from bs4 import BeautifulSoup
import psycopg2


urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
session = requests.Session()
session.headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36"}
url = 'https://www.smitegame.com/'
content = session.get(url, verify=False).content
soup = BeautifulSoup(content, "html.parser")
allgods = soup.find_all('div', {'class': 'god'})

allitem = []

for god in allgods:
    godName = god.find('p')
    godFoto = god.find('img').get('src')
    allitem.append((godName, godFoto))
    print(godName.text)

How do I need to approach this, I've made a class in models.py named GodList. But as soon as I try to import it I cannot run the scrape script anymore.

Am I aproaching this wrong?

I have the postgresql database connected to Django and it works. I can add models and I see it gets saved in the data base.

Leendert
  • 13
  • 1
  • 8

1 Answers1

1

Django is framework designed to build web applicattions. So it means that when the web-browser sends a request to a server, server processes the request and produces a response with adequate data, data gets send and displayed in the browser. This also means that most of processing that is done in Django happens in request context, while the applications is running.

Now if Django is not running and you try to use it scripts it crashes because it does not have its configuration loaded. So what you are really trying to do is using Django database layer outside of Django. To achieve this you load the settings and setup Django before you use its models.

Question how to use Django ORM outside of Django has already been answered here Using Django database layer outside of Django? and here How to use Django models outside of Django?

For the sake of making your code work, if we supposedly if we have application 'football' and an app called 'list' in it with model 'Player' in its models module, and your script is in folder with manage.py. Than your code could look like the following:

import requests
import urllib3
from bs4 import BeautifulSoup
import os
import django

os.environ['DJANGO_SETTINGS_MODULE'] = 'football.settings'
django.setup()

from list.models import Player

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
session = requests.Session()
session.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36"}
url = 'https://www.smitegame.com/'
content = session.get(url, verify=False).content
soup = BeautifulSoup(content, 'html.parser')
allgods = soup.find_all('div', {'class': 'god'})

allitem = []

for god in allgods:
    godName = god.find('p')
    godFoto = god.find('img').get('src')
    allitem.append((godName, godFoto))
    Player.objects.create(name=godName.text)

Now what the bit of code that was added tells Django where its settings module, and then imports the models.

When you use Django, you have to tell it which settings you’re using. Do this by using an environment variable,

Check the documentation https://docs.djangoproject.com/en/3.0/topics/settings/#designating-the-settings

  • It took me a while to figure out that the argument name had to be the name in my models but after that it worked! I can see in pg admin that something is saved inside the db. Next step is to view this and/or send it to my front end. But you answered my question perfectly! – Leendert Mar 05 '20 at 21:55