0

I downloaded my personal Spotify data from the Spotify website. I converted these data from JSON to a regular R dataframe for further analysis. This personal dataframe has 4 columns:

Endtime   artistName   trackName   Msplayed

However, Spotify has many variables coupled to songs from an artist, that you can only retrieve using the function get_artist_audio_features from the spotifyr package. I want to join these variables to my personal dataframe. The package allows data retrieval for only one artist at a time and it would be very time consuming to write a line of code for all 3000+ artists in my dataframe.

I used a for loop to try and collect the metadata for the artists:

empty_list <- vector(mode = "list")
  
  for(i in df$artistName){
    empty_list[[i]] <- get_artist_audio_features(i)
    }

My dataframe also has podcasts, for which non of this meta-data is available. When i try using the function on a podcast i get the error message:

Error in get_artist_audio_features(i) : 
  No artist found with artist_id=''.
In addition: Warning messages:
1: Unknown or uninitialised column: `id`. 
2: Unknown or uninitialised column: `name`. 

When i use the for loop, it stops as soon as the first error (podcast) in the dataframe occurs. When i feed it a vector of only artists and no podcasts, it works perfectly.

I checked stack for possible answers (most notably: Skipping error in for-loop) but i cant get the loop to work.

My question: how can i use the function spotifyr::get_artist_audio_features in a for loop and skip the errors, storing the results in a list. Unfortunately, it is very difficult to post a reproducable example, since you need to active a developer account on spotify to use the spotifyr package.

Stevestingray
  • 399
  • 2
  • 12

1 Answers1

1

It looks like your issue is in artist_id = '', so try the below code to see if it helps get you started (since I don't have reproducible data, not sure if it will help). In this case it should just skip the podcasts, but I'm sure some more codesmithing will allow you to put relevant data in the given list position.

for(i in df$artistName){
  if(artist_id = ''){
    empty_list[[i]] <- NA
  } else {
  empty_list[[i]] <- get_artist_audio_features(i)
  }
}

You could also use a while loop conditioning on an incremental i to restart the loop, but I can't do that without the data.

jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • Thanks for the comment. Unfortunately, this does not work yet, but it is a place to start, thanks! When in loop, the error is returned as in OP. When asking for a podcast outside of the loop, the name of the podcast (for which no metadata is present) is present between the ' '. So i really need to tell the loop that when it encounters an error, it needs to skip to the next iteration. I will look into the While Loop. – Stevestingray Mar 01 '22 at 08:44
  • Yep - a `while` loop may make more sense here; also look into `next` in the loop (i.e. `for(i in 1:10){if(i == 2){next}; print(i)}` will print `1,3,4,5,6,7,8,9,10` – jpsmith Mar 01 '22 at 14:19
  • Thanks again. Another problem rose, i got your loop to work, but the spotify API denies several requests in a very short time... So the loop works, but the API shuts it down as i exceed the maximum number of requests per time unit. – Stevestingray Mar 01 '22 at 14:24
  • A *very* long and crude way to get around this if there are no other options is to use `Sys.sleep(x)` in the loop, which will pause the loop for `x` seconds. (i.e. `Sys.sleep(10)` will pause the loop for 10 seconds). This will obviously take a long time to run the loop but may prevent it from being denied. – jpsmith Mar 01 '22 at 14:30
  • This works, but unfortunately this also takes (as you say) a veeeery long time. I think there is no fast workaround for this however, so i will use it and let the computer run for a while. Thanks – Stevestingray Mar 03 '22 at 10:31