0

Im currently working on a comparison where I am trying to solve on how I am able to compare between two dictionaries where the first requests does a GET and scrapes the data to a dictionary and then I want to compare to for the next request using the same method and see if there has been any changes on the webpage. I have currently done:

import random
import threading
import time
from concurrent.futures import as_completed
from concurrent.futures.thread import ThreadPoolExecutor

import requests
from bs4 import BeautifulSoup

URLS = [
    'https://github.com/search?q=hello+world',
    'https://github.com/search?q=python+3',
    'https://github.com/search?q=world',
    'https://github.com/search?q=i+love+python',
    'https://github.com/search?q=sport+today',
    'https://github.com/search?q=how+to+code',
    'https://github.com/search?q=banana',
    'https://github.com/search?q=android+vs+iphone',
    'https://github.com/search?q=please+help+me',
    'https://github.com/search?q=batman',
]


def doRequest(url):
    response = requests.get(url)
    time.sleep(random.randint(10, 30))
    return response, url


def doScrape(response):
    soup = BeautifulSoup(response.text, 'html.parser')
    return {
        'title': soup.find("input", {"name": "q"})['value'],
        'repo_count': soup.find("span", {"data-search-type": "Repositories"}).text.strip()
    }


def checkDifference(parsed, url):


def threadPoolLoop():
    with ThreadPoolExecutor(max_workers=1) as executor:
        future_tasks = [
            executor.submit(
                doRequest,
                url
            ) for url in URLS]

        for future in as_completed(future_tasks):
            response, url = future.result()
            if response.status_code == 200:
                checkDifference(doScrape(response), url)


while True:
    t = threading.Thread(target=threadPoolLoop, )
    t.start()
    print('Joining thread and waiting for it to finish...')
    t.join()

My problem is that I do not know how I can print out whenever there has been a change for either title or/and repo_count? (The whole point will be that I will run this script 24/7 and I always want it to print out whenever there has been a change)

PythonNewbie
  • 1,031
  • 1
  • 15
  • 33

1 Answers1

1

If you're looking for a simple method to compare two dictionaries, there are a few different options.

Some good resources to begin:

Let's start with two dictionaries to compare Some added elements, some removed, some changed, some same.

dict1 = {
    "value_2": 2,
    "value_3": 3,
    "value_4": 4,
    "value_5": "five",
    "value_6": "six",
}

dict2 = {
    "value_1": 1, 
    "value_2": 2, 
    "value_4": 4
}

You could probably use the unittest library. Like this:

>>> from unittest import TestCase
>>> TestCase().assertDictEqual(dict1, dict1)  # <-- No output, because they are the same
>>> TestCase().assertDictEqual(dict1, dict2)  # <-- Will raise error and display elements which are different
AssertionError: {'value_2': 2, 'value_3': 3, 'value_4': 4, 'value_5': 'five', 'value_6': 'six'} != {'value_1': 1, 'value_2': 3, 'value_4': 4}
- {'value_2': 2, 'value_3': 3, 'value_4': 4, 'value_5': 'five', 'value_6': 'six'}
+ {'value_1': 1, 'value_2': 3, 'value_4': 4}

But the challenge there is that it will raise an error when they are different; which is probably not what you're looking for. You simply want to see when they are different.

Another method is the deepdiff library. Like this:

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> pprint(DeepDiff(dict1, dict2))
{'dictionary_item_added': [root['value_1']],
 'dictionary_item_removed': [root['value_3'], root['value_5'], root['value_6']],
 'values_changed': {"root['value_2']": {'new_value': 3, 'old_value': 2}}}

Or, you could easily craft your own functions. Like this (functions copied from here)

>>> from pprint import pprint
>>> def compare_dict(d1, d2):
...    return {k: d1[k] for k in d1 if k in d2 and d1[k] == d2[k]}
>>> pprint(compare_dict(dict1, dict2))
{'value_4': 4}
>>> def dict_compare(d1, d2):
...     d1_keys = set(d1.keys())
...     d2_keys = set(d2.keys())
...     shared_keys = d1_keys.intersection(d2_keys)
...     added = d1_keys - d2_keys
...     removed = d2_keys - d1_keys
...     modified = {o: {"old": d1[o], "new": d2[o]} for o in shared_keys if d1[o] != d2[o]}
...     same = set(o for o in shared_keys if d1[o] == d2[o])
...     return {"added": added, "removed": removed, "modified": modified, "same": same}
>>> pprint(dict_compare(dict1, dict2))
{'added': {'value_6', 'value_3', 'value_5'},
 'modified': {'value_2': {'old': 2, 'new': 3}},
 'removed': {'value_1'},
 'same': {'value_4'}}
chrimaho
  • 580
  • 4
  • 22