-1

I am writing a python program, because I am lazy, that checks a website for a job opening I have been told about and returns all the jobs the companies web page.

Here is my code so far (yes I know the code is jancky however I am just trying to get it working)

import requests
from bs4 import BeautifulSoup
import sys
import os
import hashlib

reload(sys)
sys.setdefaultencoding('utf8')

res = requests.get('WEBSITE URL', verify=False)
res.raise_for_status()

filename = "JobWebsite.txt"

def StartUp():
    if not os.path.isfile(filename):
        try:
            jobfile = open(filename, 'a')
            jobfile = open(filename, 'r+')
            print("[*] Succesfully Created output file")
            return jobfile
        except:
            print("[*] Error creating output file!")
            sys.exit(0)
    else:
         try:
             jobfile = open(filename, 'r+')
             print("[*] Succesfully Opened output file")
             return jobfile
         except:
             print("[*] Error opening output file!")
             sys.exit(0)

 def AnyChange(htmlFile):
    fileCont = htmlFile.read()
    FileHash = hasher(fileCont, "File Code Hashed")
    WebHash = hasher(res.text, "Webpage Code Hashed")
    !!!!! Here is the Problem
    print ("[*] File hash is " + str(FileHash))
    print ("[*] Website hash is " + str(WebHash))
    if FileHash == WebHash:
        print ("[*] Jobs being read from file!")
        num_of_jobs(fileCont)
    else:
        print("[*] Jobs being read from website!")
        num_of_jobs(res.text)
        deleteContent(htmlFile)
        writeWebContent(htmlFile, res.text)

def hasher(content, message):
    content = hashlib.md5(content.encode('utf-8'))
    return content

def num_of_jobs(htmlFile):
    content = BeautifulSoup(htmlFile, "html.parser")
    elems = content.select('.search-result-inner')
    print("[*] There are " + str(len(elems)) + " jobs available!")

def deleteContent(htmlFile):
    print("[*] Deleting Contents of local file! ")
    htmlFile.seek(0)
    htmlFile.truncate()

def writeWebContent(htmlFile, content):
    htmlFile = open(filename, 'r+')
    print("[*] Writing Contents of website to file! ")
    htmlFile.write(content.encode('utf-8'))

jobfile = StartUp()
AnyChange(jobfile)

The problem I currently have is that I hash both of the websites html code and the files html code. However both of the hashes don't match, like ever, I am not sure and can only guess that it might be something with the contents being save in a file. The hashes aren't too far apart but it still causes the If statement to fail each time

Breakpoint in Program with hashes

Neil Boyd
  • 1
  • 2

1 Answers1

1

The screenshot you have attached is showing the location of the two hash objects fileHash and webHash. They should be in different locations.

What you really want to compare is the hexdigest() of the two hash objects. Change your if statement to:

if FileHash.hexdigest() == WebHash.hexdigest():
        print ("[*] Jobs being read from file!")
        num_of_jobs(fileCont)

Take a look at this other StackOverflow answer for some more how-to.

Antimony
  • 2,230
  • 3
  • 28
  • 38
  • I see what you mean with the assembly addresses, was working on this at like 1am and just flew over my head. However the hashes still differ in value after changing . I am going to look more into it and I will report back if I fix it. – Neil Boyd Oct 12 '17 at 10:24