1

below is my code in this i have around 9000 records which should print in excel but as they are duplicate values so i am getting around 3000 records (i want all of the records duplicate also) pls help me new to coding

import requests
from collections import defaultdict
from bs4 import BeautifulSoup as bs

end_number = 800
current_page = 1
pdf_links = {}
path = r"C:\Users\deepak jain\Desktop\BID"

with requests.Session() as s:
    while True:
        r = s.get(f'https://bidplus.gem.gov.in/bidlists?bidlists&page_no={current_page}')
        soup = bs(r.content, 'lxml')
        for i in soup.select('.bid_no > a'):
            pdf_links[i.text.strip().replace('/', '_')] = 'https://bidplus.gem.gov.in' + i['href']
        # print(pdf_links)
        if current_page == 1:
            num_pages = int(soup.select_one('.pagination li:last-of-type > a')['data-ci-pagination-page'])
            print(num_pages)
        if current_page == num_pages or current_page > end_number:
            break
        current_page += 1
result = [key for key, values in pdf_links.items()
                              if len(values) > 1]
print("duplicate values", str(result))
for k, v in pdf_links.items():
    with open(f'{path}/{k}.pdf', 'wb') as f:
        r = s.get(v)
        f.write(r.content)
Deepak Jain
  • 137
  • 1
  • 3
  • 27
  • 3
    you should simplify your question to just keep the dictionary relevant part, ideally using a minimal abstract example. – mozway Jan 16 '22 at 10:46
  • `if len(values) > 1` seems to be an invalid judgment, it must have a value, because it is a spliced string – pppig Jan 16 '22 at 11:19
  • If you want to store multiple values for each key, you can use `defaultdict(list)` https://docs.python.org/3/library/collections.html#collections.defaultdict – Simon Crowe Jan 16 '22 at 11:34
  • @SimonCrowe pls help don't know how to do it – Deepak Jain Jan 16 '22 at 11:37
  • My comment was based on your question title. Looking at your code a bit more, I'm not sure if you're trying to do what I thought you were. – Simon Crowe Jan 16 '22 at 11:42
  • There's also at least one problem with your code: the requests session object `s` is used outside of the `with` block. Once the block has been run, the session is closed, so I don't think your code will work. – Simon Crowe Jan 16 '22 at 11:44
  • If you want to count the number of times an item is found in a collection, you could use the `collections.Counter`. This would give you the number of instances of each value in `your dict`, which you could use to work out how many duplicates there are, (the keys would not contain duplicates because of the way dictionaries fundamentally work). https://docs.python.org/3/library/collections.html#collections.Counter – Simon Crowe Jan 16 '22 at 11:48
  • 1
    As most people can't get to the website to see what's being returned, I suggest you print the contents of *pdf_links* then give an example of what output you're trying to generate. https://stackoverflow.com/questions/70530968/how-to-download-all-the-href-pdf-inside-a-class-with-python-beautiful-soup – DarkKnight Jan 16 '22 at 12:03

0 Answers0