1

I am confused by the class attribute of list items, inside an unordered list.

Developer tools view

I mention I am writing a python program to crawl from a website, which targets the li elements inside an ul list. There are 45 li elements inside the ul, 17 of which have no "class" attribute assigned to them. Here is a portion of the ul.

ul view

My customized target selector is "ul.vacanciesList li" and I only get the 17 ones that don't have the "class" keyword.

My question is, what is that "class" keyword that appears in the markup for the li elements, and how to target them (the li-s) in order to get all 45 of them, not only the ones without class.

Customized code:

'title' => ['selector' => 'h3'],
            'containerSelector' => 'ul.vacanciesList li',
            'detailSelector' => '#bigbox',
            'location' => ['selector' => 'div.place'],

Thank you.

Anonta
  • 2,500
  • 2
  • 15
  • 25
  • 1
    "I mention I am writing a python program to crawl from a website, which targets the li elements inside an ul list", please post your code – samAlvin Oct 04 '17 at 08:52
  • Did it, but the code is customized, doubt seeing the selector will be much help. Basically, I am just trying to understand what the "class" keyword represents, and how to target it properly in the selector. Thank you. – Andrei-Cristian Ene Oct 04 '17 at 09:20

1 Answers1

1

An empty attribute (attribute without value) is valid. <tag class=""> or <tag class> just means the element belongs to the class "". Read this answer for more details.

To find the list items:

soup = bs4.BeautifulSoup(page, 'lxml')
litems = soup.findAll('li', {'class' : ''})

Or, you can find the ul tag, which does have a class attribute value assigned to it and get all the listitems from there.

soup = bs4.BeautifulSoup(page, 'lxml')

# get the unordered list of interest
unordered_list = soup.finqd('ul', {'class' : 'article vacanciesList'})
# extract all the list items from them
list_items = unordered_list.findAll('li')

print(list_items)
Anonta
  • 2,500
  • 2
  • 15
  • 25