0

I have a question. I'm parsing a website with Beautiful soup and adding some html tags and their contents to two different lists granted on the conditions they satisfy. Anyway, I have two lists,

name = [<a class="name-link" href="/shop/tops-sweaters/wxyvjbwed/emon78ji2">Vertical Logo Baseball Jersey</a>, <a class="name-link" href="/shop/tops-sweaters/wxyvjbwed/q2j1gm57b">Vertical L
ogo Baseball Jersey</a>, <a class="name-link" href="/shop/tops-sweaters/wxyvjbwed/ulovwdkr3">Vertical Logo Baseball Jersey</a>]

and

color = [<a class="name-link" href="/shop/tops-sweaters/wxyvjbwed/emon78ji2">Red</a>, <a class="name-link" href="/shop/tops-sweaters/noh7spfz2/kg3lseuzf">Red</a>, <a class="name-link" href="
/shop/tops-sweaters/p98rptfuw/a52kgnw0j">Red</a>, <a class="name-link" href="/shop/tops-sweaters/jxupqcv7o/vbj8g1f7u">Red</a>, <a class="name-link" href="/shop/tops-sweaters/gxfe5iqz
b/ulw54cqk3">Red</a>]

There is a set of matching hrefs between these two lists. I do not know what that href value is before I make the lists. Is there any html library or something built into python that could help solve my problem here? Here is the matching href between the lists as well, "/shop/tops-sweaters/wxyvjbwed/emon78ji2" . This should be the output

EDIT: Here is the html structure. The h1 tag surrounds the tag.

<h1><a class="name-link" href="/shop/tops-sweaters/wxyvjbwed/emon78ji2">Vertical Logo Baseball Jersey</a></h1>
UCProgrammer
  • 517
  • 7
  • 21

1 Answers1

1

If you're already using Beautiful Soup to find the a tags why not just pull the href values when you have the objects. For example:

list = [a['href'] for a in soup.find_all('a', href=True)]

If you make each list a list of hrefs instead of the entire tag you can compare them easier.

matching = set(list1) & set(list2)
it's-yer-boy-chet
  • 1,917
  • 2
  • 12
  • 21