I'm trying to identify DOM elements by class name, but I'm not able to use the pattern.web as described in the docs (I'm also running code that I've used before, so it did work at some point).
from pattern.web import DOM
html = """<html><head><title>pattern.web | CLiPS</title></head>
<body>
<div class="class1 class2 class3">
<form action="/pages/pattern-web" accept-charset="UTF-8" method="post" id="search-block-form">
<div>
<label for="edit-search-block-form-1">Search this site: </label>
</div>
</form>
</div>
</body></html>"""
dom = DOM(html)
print "Search Results by Method:"
print 'tag[attr="value"] Notation Results:'
print dom('div[class="class1 class2 class3"]')
print
print 'tag.class Notation Results:'
print dom('div.class1')
print
print 'By class, no tag results:'
print dom.by_class('class1')
print
print 'Looping through all divs and printing matching results:'
for i in dom('div'):
if 'class' in i.attrs and i.attrs['class'] == 'class1 class2 class3':
print i.attrs
Note that (Element
and DOM
functions are interchangeable and give the same results). The result is this:
Search Results by Method:
tag[attr="value"] Notation Results:
[]
tag.class Notation Results:
[]
By class, no tag results:
[Element(tag='div')]
Looping through all divs and printing matching results:
{u'class': u'class1 class2 class3'}
As you can see, looking it up using the tag.class
notation and the tag[attr="value"]
notation both give empty results, but by_class
returns one result. Clearly elements with those attributes exist. How do I search for all the divs that have all 3 classes?
In the past, I've been able to search using dom('div.class1.class2.class3')
to identify a div with all 3 classes. Not only does this not work, it's also giving me unicode errors (it appears that the second period causes a unicode error) : TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode'