1

I'm working on a web-scraping task and I can already collect the data in a very rudimentary way.

Basically, I need a function to collect a list of songs and artists from the Allmusic.com and then add the data in df. In this example, I use this link: https://www.allmusic.com/mood/tender-xa0000001119/songs

So far, I managed to accomplish most of the objective, however, I had to perform two different functions (def get_song() and def get_performer()).

I would like, if possible, an alternative to join these two functions.

The codes used are below:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0'}
link    = "https://www.allmusic.com/mood/tender-xa0000001119/songs"


# Function to collect songs (title)
songs = []

def get_song():
url = link
source_code = requests.get(url, headers=headers)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for td in soup.findAll('td', {'class': 'title'}):
    for a in td.findAll('a')[0]:
        song = a.string
        songs.append(song)

# Function to collect performers
performers = []

def get_performer():
url = link
source_code = requests.get(url, headers=headers)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for td in soup.findAll('td', {'class': 'performer'}):
    for a in td.findAll('a'):
        performer = a.string
        performers.append(performer)

get_song(), get_performer() # Here, I call the two functions, but the goal, if possible, is to use one function.

df = pd.DataFrame(list(zip(songs,performers)), columns=['song', 'performer']) # df creation
  • You can wrap your two functions in another function that will call them. E.g., `get_data`. In the particular case, the interest of limited though... – mozway Sep 05 '21 at 18:51
  • Does this answer your question? [Call a function with argument list in python](https://stackoverflow.com/questions/817087/call-a-function-with-argument-list-in-python) – Sapt-Programmer Sep 05 '21 at 19:03

4 Answers4

2

You can just add the soup.findAll code from performer in the first function.

    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0'}
    link    = "https://www.allmusic.com/mood/tender-xa0000001119/songs"
    
    
    # Function to collect songs (title)
    songs = []
    performers = []
    
    def get_song_and_performer():
        url = link
        source_code = requests.get(url, headers=headers)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for td in soup.findAll('td', {'class': 'title'}):
            for a in td.findAll('a')[0]:
                song = a.string
                songs.append(song)
        for td in soup.findAll('td', {'class': 'performer'}):
            for a in td.findAll('a'):
                performer = a.string
                performers.append(performer)


get_song_and_performer() # Here, I call the two functions, but the goal, if possible, is to use one function.

df = pd.DataFrame(list(zip(songs,performers)), columns=['song', 'performer']) # df creation
0

You could create a seperate function for getting song info; this would be the most organized way to do this, if you want to keep the functions seperate.

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0'}
link    = "https://www.allmusic.com/mood/tender-xa0000001119/songs"


# Function to collect songs (title)
songs = []

def get_song():
    url = link
    source_code = requests.get(url, headers=headers)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for td in soup.findAll('td', {'class': 'title'}):
        for a in td.findAll('a')[0]:
            song = a.string
            songs.append(song)

# Function to collect performers
performers = []

def get_performer():
    url = link
    source_code = requests.get(url, headers=headers)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for td in soup.findAll('td', {'class': 'performer'}):
        for a in td.findAll('a'):
            performer = a.string
            performers.append(performer)

# Function for getting song and performer
def get_song_info():
    get_song()
    get_performer()

get_song_info() # Call just one function!

df = pd.DataFrame(list(zip(songs,performers)), columns=['song', 'performer']) # df creation

Sylvester Kruin
  • 3,294
  • 5
  • 16
  • 39
0

To get titles/performer you can use next example:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.allmusic.com/mood/tender-xa0000001119/songs"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

all_data = []
for td in soup.select("td.title"):
    title = td.get_text(strip=True)
    performer = td.find_next("td").get_text(strip=True)
    all_data.append((title, performer))

df = pd.DataFrame(all_data, columns=["title", "performer"])
print(df)
df.to_csv("data.csv", index=False)

Prints:

                              title                          performer
0                    Knock You Down                        Keri Hilson
1   Down Among the Wine and Spirits                     Elvis Costello
2                  I Felt The Chill                     Elvis Costello
3            She Handed Me A Mirror                     Elvis Costello
4         I Dreamed Of My Old Lover                     Elvis Costello
5                   She Was No Good                     Elvis Costello
6                  The Crooked Line                     Elvis Costello
7                 Changing Partners                     Elvis Costello
8           Small Town Southern Man                       Alan Jackson
9                    Find Your Love                              Drake
10            Today Was a Fairytale                       Taylor Swift
11                     Need You Now                             Lady A
12                   American Honey                             Lady A
13                      Peace Dream                        Ringo Starr
14                  If I Died Today                         Tim McGraw
15                            Still                         Tim McGraw
16                      I Need Love                             Ledisi
17                          Uhh Ahh                        Boyz II Men
18                  Shattered Heart                             Brandy
19            Right Here (Departed)                             Brandy
20           Warm It Up (With Love)                             Brandy
21                  If I Were a Boy                            Beyoncé
22                Why Does She Stay                              Ne-Yo
23              Daddy Needs a Drink                  Drive-By Truckers
24                  Think About You                        Ringo Starr
25                      Liverpool 8                        Ringo Starr
26                        Nefertiti                     Herbie Hancock
27                            River  Herbie Hancock/Corinne Bailey Rae
28                   Both Sides Now                     Herbie Hancock
29                  Court and Spark         Herbie Hancock/Norah Jones
30  I Taught Myself How to Grow Old                         Ryan Adams
31                           Ghetto           Kelly Rowland/Snoop Dogg
32                      Little Girl                   Enrique Iglesias
33          The Magdalene Laundries                     Emmylou Harris
34                   Because of You                              Ne-Yo
35               We Belong Together                       Mariah Carey
36          Thank You for Loving Me                           Bon Jovi
37        He's Younger Than You Are                      Sonny Rollins

and saves data.csv (screenshot from LibreOffice):

enter image description here

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

For your url, you can use pd.read_html:

source_code = requests.get(link, headers=headers)
df = pd.read_html(source_code.text)[0]  # <- Only one table in the page

Output:

>>> df
                     Title/Composer                            Performer   Stream
0                    Knock You Down                          Keri Hilson  Spotify
1   Down Among the Wine and Spirits                       Elvis Costello      NaN
2                  I Felt The Chill                       Elvis Costello      NaN
3            She Handed Me A Mirror                       Elvis Costello      NaN
4         I Dreamed Of My Old Lover                       Elvis Costello      NaN
5                   She Was No Good                       Elvis Costello      NaN
6                  The Crooked Line                       Elvis Costello      NaN
7                 Changing Partners                       Elvis Costello      NaN
8           Small Town Southern Man                         Alan Jackson  Spotify
9                    Find Your Love                                Drake  Spotify
10            Today Was a Fairytale                         Taylor Swift  Spotify
11                     Need You Now                               Lady A  Spotify
12                   American Honey                               Lady A      NaN
13                      Peace Dream                          Ringo Starr  Spotify
14                  If I Died Today                           Tim McGraw  Spotify
15                            Still                           Tim McGraw  Spotify
16                      I Need Love                               Ledisi  Spotify
17                          Uhh Ahh                          Boyz II Men  Spotify
18                  Shattered Heart                               Brandy  Spotify
19            Right Here (Departed)                               Brandy  Spotify
20           Warm It Up (With Love)                               Brandy  Spotify
21                  If I Were a Boy                              Beyoncé      NaN
22                Why Does She Stay                                Ne-Yo  Spotify
23              Daddy Needs a Drink                    Drive-By Truckers  Spotify
24                  Think About You                          Ringo Starr      NaN
25                      Liverpool 8                          Ringo Starr      NaN
26                        Nefertiti                       Herbie Hancock  Spotify
27                            River  Herbie Hancock / Corinne Bailey Rae  Spotify
28                   Both Sides Now                       Herbie Hancock  Spotify
29                  Court and Spark         Herbie Hancock / Norah Jones  Spotify
30  I Taught Myself How to Grow Old                           Ryan Adams  Spotify
31                           Ghetto           Kelly Rowland / Snoop Dogg      NaN
32                      Little Girl                     Enrique Iglesias  Spotify
33          The Magdalene Laundries                       Emmylou Harris  Spotify
34                   Because of You                                Ne-Yo  Spotify
35               We Belong Together                         Mariah Carey  Spotify
36          Thank You for Loving Me                             Bon Jovi  Spotify
37        He's Younger Than You Are                        Sonny Rollins  Spotify
Corralien
  • 109,409
  • 8
  • 28
  • 52