0

I write python program to crawl data,some items' class are "_3pw9 _2pi4 _2ge8", and some items' class are "_3pw9 _2pi4 _2ge8 _3ms8", I'd like to crawl data whose class name contains"_3pw9 _2pi4 _2ge8", and include data whose class name are "_3pw9 _2pi4 _2ge8 _3ms8", I write:

soup_user_gender_page = BeautifulSoup(html_user_gender_page,"html.parser")
        soup_user_about_main_frame = soup_user_gender_page.find("div", 
id="pagelet_timeline_medley_about")
        if soup_user_about_main_frame:
            soup_user_basic_main_frame = 
soup_user_about_main_frame.find("div",id="pagelet_basic")
            if soup_user_basic_main_frame:
                soup_user_about_li_block = 
soup_user_basic_main_frame.find_all("li",class_= "_3pw9 _2pi4 _2ge8")

however, only class name are "_3pw9 _2pi4 _2ge8" items are crawled, the items whose class name are "_3pw9 _2pi4 _2ge8 _3ms8" have not been crawled

could you please tell me the reason and how to write the program

bin
  • 1
  • 1
  • Possible duplicate of [Beautiful Soup if Class "Contains" or Regex?](https://stackoverflow.com/questions/34660417/beautiful-soup-if-class-contains-or-regex) – K. Kirsz Jul 14 '17 at 11:24

1 Answers1

0

I understood that your problem is that you need to crawl all the items whose class names include "_3pw9 _2pi4 _2ge8".

If that is true, you should consider changing your last line to

soup_user_basic_main_frame.find_all("li",class*= "_3pw9 _2pi4 _2ge8")

Notice that I changed the underscore after class to an asterisk, which is the official syntax for "name contains".

rlinden
  • 2,053
  • 1
  • 12
  • 13
  • I use apple laptop, do as you said to print shift + "*" after class,but in pycharm IDE, it errors, what is the reason and how to resolve it – bin Jul 14 '17 at 12:02
  • Not sure. Try the following: soup.findAll(True, {"class": re.compile("^_3pw9 _2pi4 _2ge8")}) – rlinden Jul 14 '17 at 12:26