compare lists and see if part of element in one list exists in element in another list

Question

I have a question. I'm parsing a website with Beautiful soup and adding some html tags and their contents to two different lists granted on the conditions they satisfy. Anyway, I have two lists,

name = [<a class="name-link" href="/shop/tops-sweaters/wxyvjbwed/emon78ji2">Vertical Logo Baseball Jersey</a>, <a class="name-link" href="/shop/tops-sweaters/wxyvjbwed/q2j1gm57b">Vertical L
ogo Baseball Jersey</a>, <a class="name-link" href="/shop/tops-sweaters/wxyvjbwed/ulovwdkr3">Vertical Logo Baseball Jersey</a>]

and

color = [<a class="name-link" href="/shop/tops-sweaters/wxyvjbwed/emon78ji2">Red</a>, <a class="name-link" href="/shop/tops-sweaters/noh7spfz2/kg3lseuzf">Red</a>, <a class="name-link" href="
/shop/tops-sweaters/p98rptfuw/a52kgnw0j">Red</a>, <a class="name-link" href="/shop/tops-sweaters/jxupqcv7o/vbj8g1f7u">Red</a>, <a class="name-link" href="/shop/tops-sweaters/gxfe5iqz
b/ulw54cqk3">Red</a>]

There is a set of matching hrefs between these two lists. I do not know what that href value is before I make the lists. Is there any html library or something built into python that could help solve my problem here? Here is the matching href between the lists as well, "/shop/tops-sweaters/wxyvjbwed/emon78ji2" . This should be the output

EDIT: Here is the html structure. The h1 tag surrounds the tag.

<h1><a class="name-link" href="/shop/tops-sweaters/wxyvjbwed/emon78ji2">Vertical Logo Baseball Jersey</a></h1>

"/shop/tops-sweaters/wxyvjbwed/emon78ji2" is the expected output. or href="/shop/tops-sweaters/wxyvjbwed/emon78ji2". That is the similar href between the two lists — UCProgrammer, Aug 29 '18 at 00:29
I would still use `beautiful soup`, check out this: https://stackoverflow.com/questions/5815747/beautifulsoup-getting-href — PixelEinstein, Aug 29 '18 at 00:33

score 1 · Answer 1 · answered Aug 29 '18 at 00:34

1

If you're already using Beautiful Soup to find the a tags why not just pull the href values when you have the objects. For example:

list = [a['href'] for a in soup.find_all('a', href=True)]

If you make each list a list of hrefs instead of the entire tag you can compare them easier.

matching = set(list1) & set(list2)

answered Aug 29 '18 at 00:34

it's-yer-boy-chet

1,917
2
12
21

hmmm that may work. But I had already performed a find_all() on my soup object for a h1 tag. The tag and its attributes are nested inside of there. Please take a look at my edit for notation and correct me if I am wrong, but you cannot call find() on a soup object that has already had find_all() called on it. – UCProgrammer Aug 29 '18 at 00:44
Something like the answer here helps, https://stackoverflow.com/questions/46510966/beautiful-soup-nested-tag-search but I can't originally call find_all() on soup or I get an AttributeError: AttributeError: 'ResultSet' object has no attribute 'find_all' – UCProgrammer Aug 29 '18 at 00:50
nevermind. I got it. I'm an idiot and it's late. thanks – UCProgrammer Aug 29 '18 at 00:53

compare lists and see if part of element in one list exists in element in another list

1 Answers1