0

I was just wondering if anyone knew a faster / more efficient way to do this, and improve it speeds up a bit more because this is a tester, and I will mostly dealing with 123 cities over 20 countries

for country in cities:
    for city in cities[country]:
        for job_title in _job_title:
            for start in range(0, max_results_per_city, 10):
                url = urls[country] + \
                    "/jobs?q={}&l={}&sort=date&start={}".format(
                        job_title, city, start)
                print(url)
                time.sleep(1)
                response = requests.get(url)
                data = response.text
                soup = get_soup(data)
                html = soup.find_all(name="div", attrs={"class": "row"})
                for page in html:
                    job = extract_job_title(page)
                    job_title_match = [job_prefix for job_prefix in _job_title if job_prefix in job]
                    if not len(job_title_match) > 0:    
                        pass
                    else:
                        with open(self.file, 'w') as outfile:
                            json.dump(unique(self.data_extracted), outfile, indent=4)

time

real    0m45.970s
user    0m1.657s
sys 0m0.090s
jzz_joker
  • 27
  • 5
  • 2
    Remove the line `time.sleep(1)` in the inner loop. – ekhumoro Oct 26 '19 at 16:37
  • if I remove that line it becomes to and is there a way to improve a bit more? ``` real 0m30.864s user 0m2.110s sys 0m0.098s ```` – jzz_joker Oct 26 '19 at 16:42
  • Sure: get a faster internet connection (i.e. the performance bottleneck is entirely network-related). The only other way to speed things up further would be to use a parallel downloader. – ekhumoro Oct 26 '19 at 16:46
  • so this code is more related to network than code? – jzz_joker Oct 26 '19 at 16:49
  • Yes - obviously the code following the network request cannot execute until the page has been downloaded. If you want to improve performance, you will need to download multiple pages at once (hint: search for `python parallel download` or take a look at [this question](https://stackoverflow.com/q/16181121/984421)). – ekhumoro Oct 26 '19 at 16:53
  • the above you mention is download the page in memory as I am doing right now? – jzz_joker Oct 26 '19 at 17:13
  • No - download in parallel (using multiple threads/processes). – ekhumoro Oct 26 '19 at 18:11

0 Answers0