1

I am trying to analyze all user-curated Spotify playlists and the tracks inside all of them, especially in the hip-hop genre. The result that I want is a list of user-curated playlists ID (preferably 50,000 playlist IDs)

I have tried using search API and Get Category’s Playlist Spotify API. The problem is that

  1. There is a 1,000 data limit forsearch API.
  2. Get Category’s Playlist Spotify API only gives out Spotify-curated playlists on each genre.

I also tried to go around the search API by thinking of parsing different queries (i.e. search on 'a','b','c','d',...). However, I still have no idea which queries will best represent Spotify playlists as a whole (as searching 'a','b',... would be considered too random). I would appreciate any help or ideas!

This is what I have tried with Get Category’s Playlist Spotify API with Spotipy Library in Google Colab

import pandas as pd
import numpy as np
import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.oauth2 as oauth2

# Replace Auth details with your Client ID, Secret
spotify_details = {
    'client_id' : 'Client ID',
    'client_secret':'Client Secret',
    'redirect_uri':'Redirect_uri'}

scope = "user-library-read user-follow-read user-top-read playlist-read-private playlist-read-collaborative playlist-modify-public playlist-modify-private" 

sp = spotipy.Spotify(
        auth_manager=spotipy.SpotifyOAuth(
          client_id=spotify_details['client_id'],
          client_secret=spotify_details['client_secret'],
          redirect_uri=spotify_details['redirect_uri'],    
          scope=scope,open_browser=False))


results = sp.category_playlists(category_id="hiphop", limit = 5, country="US", offset=0)
total = results["playlists"]["total"]
df=pd.DataFrame([],columns = ['id', 'name', 'external_urls.spotify'])
for offset in range(0,total,50):
  results = sp.category_playlists(category_id="hiphop", limit = 50, country="US", offset=offset)
  playlists = pd.json_normalize(results['playlists']['items'])
  #print(playlists.keys)
  df=pd.concat([df,playlists])
df

I only can get around 104 playlists when I run

print(len(df)) 
>>104

P.S. This number varies around 80-100+ depending on the location of your account.

Tanachai A
  • 13
  • 3

2 Answers2

2

Main idea is same as @Nima Akbarzadeh's idea with offset

I am using axios call with Spotify API call on node.js

Got the playlists first, then get track within loop each playlist.

This Code can get all of hiphop songs from Spotify.

const axios = require('axios')

const API_KEY='<your client ID>'
const API_KEY_SECRET='<your client Secret>'

const getToken = async () => {
    try {
        const resp = await axios.post(
            url = 'https://accounts.spotify.com/api/token',
            data = '',
            config = {
                params: {
                    'grant_type': 'client_credentials'
                },
                auth: {
                    username: API_KEY,
                    password: API_KEY_SECRET
                }
            }
        );
        return Promise.resolve(resp.data.access_token);
    } catch (err) {
        console.error(err)
        return Promise.reject(err)
    }
};
const getCategories = async (category_id, token) => {
    try {
        let offset = 0
        let next = 1
        const songs = [];
        while (next != null) {
            const resp = await axios.get(
                url = `https://api.spotify.com/v1/browse/categories/${category_id}/playlists?country=US&offset=${offset}&limit=20`,
                config = {
                    headers: {
                        'Accept-Encoding': 'application/json',
                        'Authorization': `Bearer ${token}`,
                    }
                }
            );
            
            for(const item of resp.data.playlists.items) {
                if(item?.name != null) {
                    songs.push({
                        name: item.name,
                        external_urls: item.external_urls.spotify,
                        type: item.type,
                        id : item.id
                    })
                }
            }

            offset = offset + 20

            next = resp.data.playlists.next
        }
        return Promise.resolve(songs)
    } catch (err) {
        console.error(err)
        return Promise.reject(err)
    }
}

const getTracks = async (playlists, token) => {
    try {
        const tracks = [];
        for(const playlist of playlists) {
            const resp = await axios.get(
                url = `https://api.spotify.com/v1/playlists/${playlist.id}`,
                config = {
                    headers: {
                        'Accept-Encoding': 'application/json',
                        'Authorization': `Bearer ${token}`,
                    }
                }
            );
            for(const item of resp.data.tracks.items) {
                if(item.track?.name != null) {
                    tracks.push({
                        name: item.track.name,
                        external_urls: item.track.external_urls.spotify
                    })
                }
            }
        }
        return Promise.resolve(tracks)
    } catch (err) {
        console.error(err)
        return Promise.reject(err)
    }
};

getToken()
    .then(token => {
        getCategories('hiphop', token)
            .then(playlists => {
                getTracks(playlists, token)
                    .then(tracks => {
                        for(const track of tracks) {
                            console.log(track)
                        }
                    })
                    .catch(error => {
                        console.log(error.message);
                    });  
            })
            .catch(error => {
                console.log(error.message);
            });
      
    })
    .catch(error => {
        console.log(error.message);
    });

I got 6435 songs

$ node get-data.js
[
  {
    name: 'RapCaviar',
    external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DX0XUsuxWHRQd',
    type: 'playlist',
    id: '37i9dQZF1DX0XUsuxWHRQd'
  },
  {
    name: "Feelin' Myself",
    external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DX6GwdWRQMQpq',
    type: 'playlist',
    id: '37i9dQZF1DX6GwdWRQMQpq'
  },
  {
    name: 'Most Necessary',
    external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DX2RxBh64BHjQ',
    type: 'playlist',
    id: '37i9dQZF1DX2RxBh64BHjQ'
  },
  {
    name: 'Gold School',
    external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DWVA1Gq4XHa6U',
    type: 'playlist',
    id: '37i9dQZF1DWVA1Gq4XHa6U'
  },
  {
    name: 'Locked In',
    external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DWTl4y3vgJOXW',
    type: 'playlist',
    id: '37i9dQZF1DWTl4y3vgJOXW'
  },
  {
    name: 'Taste',
    external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DWSUur0QPPsOn',
    type: 'playlist',
    id: '37i9dQZF1DWSUur0QPPsOn'
  },
  {
    name: 'Get Turnt',
    external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DWY4xHQp97fN6',
    type: 'playlist',
    id: '37i9dQZF1DWY4xHQp97fN6'
  },
...
 {
    name: 'BILLS PAID (feat. Latto & City Girls)',
    external_urls: 'https://open.spotify.com/track/0JiLQRLOeWQdPC9rVpOqqo'
  },
  {
    name: 'Persuasive (with SZA)',
    external_urls: 'https://open.spotify.com/track/67v2UHujFruxWrDmjPYxD6'
  },
  {
    name: 'Shirt',
    external_urls: 'https://open.spotify.com/track/34ZAzO78a5DAVNrYIGWcPm'
  },
  {
    name: 'Back 2 the Streets',
    external_urls: 'https://open.spotify.com/track/3Z9aukqdW2HuzFF1x9lKUm'
  },
  {
    name: 'FTCU (feat. GloRilla & Gangsta Boo)',
    external_urls: 'https://open.spotify.com/track/4lxTmHPgoRWwM9QisWobJL'
  },
  {
    name: 'My Way',
    external_urls: 'https://open.spotify.com/track/5BcIBbBdkjSYnf5jNlLG7j'
  },
  {
    name: 'Donk',
    external_urls: 'https://open.spotify.com/track/58lmOL5ql1YIXrpRpoYi3i'
  },
  ... 6335 more items
]
node get-data.js > result.json

enter image description here

Update with Python version

import spotipy
from spotipy.oauth2 import SpotifyOAuth
import json
import re

SCOPE = ['user-library-read',
    'user-follow-read',
    'user-top-read',
    'playlist-read-private',
    'playlist-read-collaborative',
    'playlist-modify-public',
    'playlist-modify-private']
USER_ID = '<your user id>'
REDIRECT_URI = '<your redirect uri>'
CLIENT_ID = '<your client id>'
CLIENT_SECRET = '<your client secret>'
auth_manager = SpotifyOAuth(
    scope=SCOPE,
    username=USER_ID,
    redirect_uri=REDIRECT_URI,
    client_id=CLIENT_ID,
    client_secret=CLIENT_SECRET)

def get_categories():
    try:
        sp = spotipy.Spotify(auth_manager=auth_manager)
        query_limit = 50
        categories=[]
        new_offset = 0
        while True:
            results=sp.category_playlists(category_id='hiphop', limit = query_limit, country='US', offset=new_offset)
            for item in results['playlists']['items']:
                if (item is not None and item['name'] is not None):
                    # ['https:', '', 'api.spotify.com', 'v1', 'playlists', '37i9dQZF1DX0XUsuxWHRQd', 'tracks']
                    tokens = re.split(r"[\/]", item['tracks']['href'])
                    categories.append({
                        'id' : item['id'],
                        'name': item['name'],
                        'url': item['external_urls']['spotify'],
                        'tracks': item['tracks']['href'],
                        'playlist_id': tokens[5],
                        'type': item['type']
                    })
            new_offset = new_offset + query_limit
            next = results['playlists']['next']
            if next is None:
                break
        return categories
    except Exception as e:
        print('Failed to upload to call get_categories: '+ str(e))

def get_songs(categories):
    try:
        sp = spotipy.Spotify(auth_manager=auth_manager)
        songs=[]
        for category in categories:
            if category is None:
                break
            playlist_id = category['playlist_id']
            results=sp.playlist(playlist_id=playlist_id)
            for item in results['tracks']['items']:
                if (item is not None and item['track'] is not None and item['track']['id'] is not None and item['track']['name'] is not None and item['track']['external_urls']['spotify'] is not None):
                    songs.append({
                        'id' : item['track']['id'],
                        'name': item['track']['name'],
                        'url': item['track']['external_urls']['spotify']
                    })
                else:
                    break
        return songs
    except Exception as e:
        print('Failed to upload to call get_songs: '+ str(e))

categories = get_categories()
songs = get_songs(categories)
print(json.dumps(songs))
# print(len(songs)) -> 6021

Result by

$ python get-songs.py > all-songs.json

enter image description here

Bench Vue
  • 5,257
  • 2
  • 10
  • 14
  • Great example. Thank you for completing my answer. – Nima Akbarzadeh Feb 10 '23 at 18:26
  • No problem! we are learn from each other. – Bench Vue Feb 10 '23 at 18:27
  • Thank you so much for the clear explanation and code. I have tried something similar and can only get around 100 spotify-curated playlists. Is there away to also get user-curated playlists data too? – Tanachai A Feb 14 '23 at 17:15
  • @TanachaiA, can you show reprodurable code for `only get around 100 spotify-curated playlists.`? I need to run it in my hand thenI can figure out what is your talking about. – Bench Vue Feb 14 '23 at 17:37
  • @BenchVue I have added more details on the similar code that you shared on python. If you have any problem with the code, please let me know. Thank you so much! – Tanachai A Feb 15 '23 at 23:59
  • @TanachaiA, I added a python code and result, I got 6021 songs. – Bench Vue Feb 16 '23 at 11:58
  • @BenchVue May I ask how many playlists did you get? I guess you scraped 6021 songs from scraping a number of playlists. – Tanachai A Feb 16 '23 at 15:44
  • @BenchVue I can only get around 104 playlists. – Tanachai A Feb 16 '23 at 16:35
  • @TanachaiA, I got from the get_categories() is 117 categories, I got from get_songs() is 6021 songs. – Bench Vue Feb 16 '23 at 18:11
  • Thank you for trying the same code. Do you know how to get more playlists than that? It doesn't need to be in the Hip Hop category but they needed to be user-curated playlists, not Spotify-curated playlists that are created by Spotify? – Tanachai A Feb 17 '23 at 00:02
  • Which category can makes it? `hiphop` is not good example. – Bench Vue Feb 17 '23 at 00:03
  • I checked all of genres of Spotify (53 genres), the highest categories(playlist) total is `K-Pop`. It's number is 212, second is `Latin`. It's number is 199. None of them not over 1000 playlists. – Bench Vue Feb 17 '23 at 01:24
  • Can you tell me what is the number of total playlists that you scraped from all genres? – Tanachai A Feb 17 '23 at 06:09
  • This [API](https://developer.spotify.com/console/get-browse-categories/) can get all genres and It returned category name and if, it can call get [single category's playlist](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-a-categories-playlists)(genre) API for each genre. It can count a total number for each genre. – Bench Vue Feb 17 '23 at 07:16
  • @TanachaiA, Is it time to vote me? – Bench Vue Feb 18 '23 at 22:42
  • Thank you for your ideas and contribution. I saved and accepted the answer @bench – Tanachai A Feb 21 '23 at 17:02
  • @TanachaiA, Thanks for your cooperation, I also learn a lot! – Bench Vue Feb 21 '23 at 17:41
0

Currently, Spotify will not let you scrape more than 1K as their application even show maximum 1k music (based on this answer).

Also, if there is any offset option, you can set it to 1k, and it will skip the first 1k, so you can get the second chunk.

  • @Ximzend I worked with it in the past and was fine. You can also check it here: https://stackoverflow.com/questions/64810063/spotify-api-result-limited-to-2000 – Nima Akbarzadeh Feb 10 '23 at 15:04
  • They changed it from 2000 to 1000 somewhere in between [February 06 2021](https://web.archive.org/web/20210206145717/https://developer.spotify.com/documentation/web-api/reference/) and [February 09 2021](https://web.archive.org/web/20210209170649/https://developer.spotify.com/documentation/web-api/reference/). – Ximzend Feb 10 '23 at 18:29
  • 1
    @Ximzend Check out the new answer. – Nima Akbarzadeh Feb 10 '23 at 18:33
  • I thought you were talking about the search offset, but the offset for playlist tracks can indeed go beyond 1000 tracks. – Ximzend Feb 12 '23 at 08:38
  • Thank you for the suggestions. I think the offset of search API cannot go beyond 1000 which means I cannot get over 1000 playlists per search query. But there are no such limit when getting tracks from a playlist. Is this correct? – Tanachai A Feb 16 '23 at 17:16