-3

My question is how to get Unique values from second for loop.

It repeated the results equal to the number of tags in the first for loop, and I need unique values (Just one from each result)

for menu in soup.findAll(['span','div','li','ul','nav','p'], {'class':[ re.compile("item" ,re.IGNORECASE),re.compile("menu" ,re.IGNORECASE),re.compile("catego" ,re.IGNORECASE),re.compile("mega" ,re.IGNORECASE),re.compile("main" ,re.IGNORECASE),re.compile("search" ,re.IGNORECASE),re.compile("rela" ,re.IGNORECASE),re.compile("nav" ,re.IGNORECASE),re.compile("prim" ,re.IGNORECASE)]}):

    for link in menu.find_all('a'):
        print(link.text)
        print(link['href'])

My result is :

میز پینگ پنگ

https://www.needmode.com/product-category/%d9%84%d9%88%d8%a7%d8%b2%d9%85-%d9%88%d8%b1%d8%b2%d8%b4%db%8c/%d8%b1%d8%a7%da%a9%d8%aa%db%8c/%d9%be%db%8c%d9%86%da%af-%d9%be%d9%86%da%af/%d9%85%db%8c%d8%b2-%d9%be%db%8c%d9%86%da%af-%d9%be%d9%86%da%af/

چوب راکت پینگ پنگ

https://www.needmode.com/product-category/%d9%84%d9%88%d8%a7%d8%b2%d9%85-%d9%88%d8%b1%d8%b2%d8%b4%db%8c/%d8%b1%d8%a7%da%a9%d8%aa%db%8c/%d9%be%db%8c%d9%86%da%af-%d9%be%d9%86%da%af/%da%86%d9%88%d8%a8-%d8%b1%d8%a7%da%a9%d8%aa-%d9%be%db%8c%d9%86%da%af-%d9%be%d9%86%da%af/

رویه راکت پینگ پنگ

https://www.needmode.com/product-category/%d9%84%d9%88%d8%a7%d8%b2%d9%85-%d9%88%d8%b1%d8%b2%d8%b4%db%8c/%d8%b1%d8%a7%da%a9%d8%aa%db%8c/%d9%be%db%8c%d9%86%da%af-%d9%be%d9%86%da%af/%d8%b1%d9%88%db%8c%d9%87-%d8%b1%d8%a7%da%a9%d8%aa-%d9%be%db%8c%d9%86%da%af-%d9%be%d9%86%da%af/

And so many other links with tags but repeated after some iterations.

Thanks in advance for any help.

1 Answers1

-1

To get unique values from the second loop, you can use a Python set data structure. Sets automatically eliminate duplicate entries, so you can store the results in a set and then print them afterward. Here's an example of how to modify your code to achieve this:

# Create an empty set to store unique values
unique_links = set()

for menu in soup.findAll(
    ['span', 'div', 'li', 'ul', 'nav', 'p'],
    {'class': [
        re.compile("item", re.IGNORECASE),
        re.compile("menu", re.IGNORECASE),
        re.compile("catego", re.IGNORECASE),
        re.compile("mega", re.IGNORECASE),
        re.compile("main", re.IGNORECASE),
        re.compile("search", re.IGNORECASE),
        re.compile("rela", re.IGNORECASE),
        re.compile("nav", re.IGNORECASE),
        re.compile("prim", re.IGNORECASE)
    ]}
):
    for link in menu.find_all('a'):  
        # Add the link's text and href to the set
        unique_links.add((link.text, link['href']))

# Print the unique values  
for text, href in unique_links:  
    print(text)  
    print(href)

By using a set, you'll ensure that each (text, href) tuple is stored only once, giving you the unique values you need.

  • Shashintha Janadari, you saved my life...:D Thanks a lot ... – Alireza Mirhabibi - IRAN Aug 03 '23 at 14:30
  • 1
    Just in addition - In newer code avoid old syntax `findAll()` instead use `find_all()` or `select()` with `css selectors` - For more take a minute to [check docs](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names) – HedgeHog Aug 03 '23 at 15:04
  • Shashinta I looks like this answer and your other one have been generated by ChatGPT, please delete them. – mozway Aug 05 '23 at 05:22