0

After the great support of @αԋɱҽԃ αмєяιcαη I have the following code

import requests
from bs4 import BeautifulSoup
import pandas as pd

masterlist = []

def main(url):
    with requests.Session() as req:
        for item in range(1, 2):
            r = req.get(url.format(item))
            print(r.url)
            soup = BeautifulSoup(r.content, 'html.parser')
            s in soup.findAll('p', class_='star-rating')
            goal = [(x.h3.a['title'], x.select_one("p.price_color").text, x.select_one("p.star-rating")['class'][-1], 'http://books.toscrape.com' + x.a.img['src'].replace('..',''))
                    for x in soup.select("li.col-xs-6")]
            #print(goal)
            masterlist.append(goal)

main("http://books.toscrape.com/catalogue/page-{}.html")
pd = df.DataFrame(masterlist)
df

The result is perfect. Now I need to learn how to export the results to excel file? Forgive me as I am trying to learn step by step. I think I have to use pandas package .. Will it be easy to use pandas in that case?

YasserKhalil
  • 9,138
  • 7
  • 36
  • 95
  • 1
    From the [docs](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#multi-valued-attributes) -- "The most common multi-valued attribute is class (that is, a tag can have more than one CSS class). Others include rel, rev, accept-charset, headers, and accesskey. Beautiful Soup presents the value(s) of a multi-valued attribute as a list" –  Nov 26 '20 at 20:58
  • Thanks a lot. But I didn't get what you mean. Can you give me a solution to the problem as I am a newbie to python? – YasserKhalil Nov 26 '20 at 21:00
  • 1
    `(x.h3.a.text, x.select_one("p.star-rating")['class'][-1], x.select_one("p.price_color").text)` – αԋɱҽԃ αмєяιcαη Nov 26 '20 at 21:02
  • Amazing. Thanks a lot for your great support. – YasserKhalil Nov 26 '20 at 21:04
  • 1
    @YasserKhalil you welcome. glad to help. Kindly be informed to avoid opening question multiple times to not get down-votes or a close vote – αԋɱҽԃ αмєяιcαη Nov 26 '20 at 21:05
  • OK my bro. I will try to be more patient. Last question : I tried this `, x.select_one("div.image_container.a")['href'])` to get the link of the image but this throws an error. Why do I fail at these stuff? – YasserKhalil Nov 26 '20 at 21:10
  • @YasserKhalil `x.a.img['src']` , you've to read `bs4` documentation and to understand the meaning of `CSS` selectors – αԋɱҽԃ αмєяιcαη Nov 26 '20 at 21:12
  • Amazing. Thanks a lot for the great support. I have edited the question to be for another issue >> sorry if I was disturbing you. – YasserKhalil Nov 26 '20 at 21:17
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/225179/discussion-between-yasserkhalil-and--c). – YasserKhalil Nov 26 '20 at 21:33
  • 1
    @YasserKhalil you've to check [ask] as what you are doing is against community rules. – αԋɱҽԃ αмєяιcαη Nov 26 '20 at 22:30

1 Answers1

0
from bs4 import BeautifulSoup
import requests


def main(url):
    with requests.Session() as req:
        for item in range(1, 2):
            r = req.get(url.format(item))
            print(r.url)
            soup = BeautifulSoup(r.content, 'html.parser')
            goal = [(x.h3.a.text, x.select_one("p.price_color").text, x.select_one("p.star-rating").attrs.items())
                    for x in soup.select("li.col-xs-6")]
            try:
                print(list(goal[0][2])[0][1][1])
            except TypeError:
                pass


main("http://books.toscrape.com/catalogue/page-{}.html")
eyal
  • 107
  • 1
  • 7