I used the Spotify Web API to pull the top songs from my personal account. I’ll go over how to get the fifty most popular songs from a user’s Spotify account using spotipy, clean the data, and produce visualizations in Python.
Top 50 Spotify Songs
Top 50 songs from my personal Spotify account, extracted using the Spotify API.Song | Artist | Album | Popularity | |
---|---|---|---|---|
1 | Borderline | Tame Impala | Borderline | 77 |
2 | Groceries | Mallrat | In the Sky | 64 |
3 | Fading | Toro y Moi | Outer Peace | 48 |
4 | Fanfare | Magic City Hippies | Hippie Castle EP | 57 |
5 | Limestone | Magic City Hippies | Hippie Castle EP | 59 |
6 | High Steppin' | The Avett Brothers | Closer Than Together | 51 |
7 | I Think Your Nose Is Bleeding | The Front Bottoms | Ann | 43 |
8 | Die Die Die | The Avett Brothers | Emotionalism (Bonus Track Version) | 44 |
9 | Spice | Magic City Hippies | Modern Animal | 42 |
10 | Bleeding White | The Avett Brothers | Closer Than Together | 53 |
11 | Prom Queen | Beach Bunny | Prom Queen | 73 |
12 | Sports | Beach Bunny | Sports | 65 |
13 | February | Beach Bunny | Crybaby | 51 |
14 | Pale Beneath The Tan (Squeeze) | The Front Bottoms | Ann | 43 |
15 | 12 Feet Deep | The Front Bottoms | Rose | 49 |
16 | Au Revoir (Adios) | The Front Bottoms | Talon Of The Hawk | 50 |
17 | Freelance | Toro y Moi | Outer Peace | 57 |
18 | Spaceman | The Killers | Day & Age (Bonus Tracks) | 62 |
19 | Destroyed By Hippie Powers | Car Seat Headrest | Teens of Denial | 51 |
20 | Why Won't They Talk To Me? | Tame Impala | Lonerism | 59 |
21 | Fallingwater | Maggie Rogers | Heard It In A Past Life | 71 |
22 | Funny You Should Ask | The Front Bottoms | Talon Of The Hawk | 48 |
23 | You Used To Say (Holy Fuck) | The Front Bottoms | Going Grey | 47 |
24 | Today Is Not Real | The Front Bottoms | Ann | 41 |
25 | Father | The Front Bottoms | The Front Bottoms | 43 |
26 | Broken Boy | Cage The Elephant | Social Cues | 60 |
27 | Wait a Minute! | WILLOW | ARDIPITHECUS | 80 |
28 | Laugh Till I Cry | The Front Bottoms | Back On Top | 47 |
29 | Nobody's Home | Mallrat | Nobody's Home | 56 |
30 | Apocalypse Dreams | Tame Impala | Lonerism | 60 |
31 | Fill in the Blank | Car Seat Headrest | Teens of Denial | 56 |
32 | Spiderhead | Cage The Elephant | Melophobia | 57 |
33 | Tie Dye Dragon | The Front Bottoms | Ann | 47 |
34 | Summer Shandy | The Front Bottoms | Back On Top | 43 |
35 | At the Beach | The Avett Brothers | Mignonette | 51 |
36 | Motorcycle | The Front Bottoms | Back On Top | 41 |
37 | The New Love Song | The Avett Brothers | Mignonette | 42 |
38 | Paranoia in B Major | The Avett Brothers | Emotionalism (Bonus Track Version) | 49 |
39 | Aberdeen | Cage The Elephant | Thank You Happy Birthday | 54 |
40 | Losing Touch | The Killers | Day & Age (Bonus Tracks) | 51 |
41 | Four of a Kind | Magic City Hippies | Hippie Castle EP | 46 |
42 | Cosmic Hero (Live at the Tramshed, Cardiff, Wa... | Car Seat Headrest | Commit Yourself Completely | 34 |
43 | Locked Up | The Avett Brothers | Closer Than Together | 49 |
44 | Bull Ride | Magic City Hippies | Hippie Castle EP | 49 |
45 | The Weight of Lies | The Avett Brothers | Emotionalism (Bonus Track Version) | 51 |
46 | Heat Wave | Snail Mail | Lush | 60 |
47 | Awkward Conversations | The Front Bottoms | Rose | 42 |
48 | Baby Drive It Down | Toro y Moi | Outer Peace | 47 |
49 | Your Love | Middle Kids | Middle Kids EP | 29 |
50 | Ordinary Pleasure | Toro y Moi | Outer Peace | 58 |
Using Spotipy and the Spotify Web API
First, I created an account with Spotify for Developers and created a client ID from the dashboard. This provides both a client ID and client secret for your application to be used when making requests to the API.
Next, from the application page, in ‘Edit Settings’, in Redirect URIs, I add http://localhost:8888/callback . This will come in handy later when logging into a specific Spotify account to pull data.
Then, I write the code to make the request to the API. This will pull the data and put it in a JSON file format.
I import the following libraries:
- Python’s OS library to facilitate the client ID, client secret, and redirect API for the code using the computer’s operating system. This will temporarily set the credentials in the environmental variables.
- Python’s json library to encode the data.
- Spotipy to provide an authorization flow for logging in to a Spotify account and obtain current top tracks for export.
import os
import json
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.util as util
Next, I define the client ID and secret to what has been assigned to my application from the Spotify API. Then, I set the environmental variables to include the the client ID, client secret, and the redirect URI.
cid ="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
secret = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
os.environ['SPOTIPY_CLIENT_ID']= cid
os.environ['SPOTIPY_CLIENT_SECRET']= secret
os.environ['SPOTIPY_REDIRECT_URI']='http://localhost:8888/callback'
Then, I work through the authorization flow from the Spotipy documentation. The first time this code is run, the user will have to provide their Sptofy username and password when prompted in the web browser.
username = ""
client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
scope = 'user-top-read'
token = util.prompt_for_user_token(username, scope)
if token:
sp = spotipy.Spotify(auth=token)
else:
print("Can't get token for", username)
In the results section, I specify the information to pull. The arguments I provide indicate 50 songs as the limit, the index of the first item to return, and the time range. The time range options, as specified in Spotify’s documentation, are:
- short_term : approximately last 4 weeks of listening
- medium_term : approximately last 6 months of listening
- long_term : last several years of listening
For my query, I decided to use the medium term argument because I thought that would give the best picture of my listening habits for the past half year. Lastly, I create a list to append the results to and then write them to a JSON file.
if token:
sp = spotipy.Spotify(auth=token)
results = sp.current_user_top_tracks(limit=50,offset=0,time_range='medium_term')
for song in range(50):
list = []
list.append(results)
with open('top50_data.json', 'w', encoding='utf-8') as f:
json.dump(list, f, ensure_ascii=False, indent=4)
else:
print("Can't get token for", username)
After compiling this code into a Python file, I run it from the command line. The output is top50_data.JSON which will need to be cleaned before using it to create visualizations.
Cleaning JSON Data for Visualizations
The top song data JSON file output is nested according to different categories, as seen in the sample below.
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/5PbpKlxQE0Ktl5lcNABoFf"
},
"href": "https://api.spotify.com/v1/artists/5PbpKlxQE0Ktl5lcNABoFf",
"id": "5PbpKlxQE0Ktl5lcNABoFf",
"name": "Car Seat Headrest",
"type": "artist",
"uri": "spotify:artist:5PbpKlxQE0Ktl5lcNABoFf"
}
],
"disc_number": 1,
"duration_ms": 303573,
"explicit": true,
"href": "https://api.spotify.com/v1/tracks/5xy3350chgFfFcdTET4xz3",
"id": "5xy3350chgFfFcdTET4xz3",
"is_local": false,
"name": "Destroyed By Hippie Powers",
"popularity": 51,
"preview_url": "https://p.scdn.co/mp3-preview/cd1a18f3f7c8ada17bb54c55524ef42e80719d1f?cid=39e9cdce36dc45e589ce5b564c0594a2",
"track_number": 3,
"type": "track",
"uri": "spotify:track:5xy3350chgFfFcdTET4xz3"
},
Before cleaning the JSON data and creating visualizations in a new file, I import json, pandas, matplotlib, and seaborn. Next, I load the JSON file with the top 50 song data.
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
with open('top50_data.json') as f:
data = json.load(f)
I create a full list of all the data to start. Next, I create lists where I will append the specific JSON data. Using a loop, I access each of the items of interest for analysis and append them to the lists.
list_of_results = data[0]["items"]
list_of_artist_names = []
list_of_artist_uri = []
list_of_song_names = []
list_of_song_uri = []
list_of_durations_ms = []
list_of_explicit = []
list_of_albums = []
list_of_popularity = []
for result in list_of_results:
result["album"]
this_artists_name = result["artists"][0]["name"]
list_of_artist_names.append(this_artists_name)
this_artists_uri = result["artists"][0]["uri"]
list_of_artist_uri.append(this_artists_uri)
list_of_songs = result["name"]
list_of_song_names.append(list_of_songs)
song_uri = result["uri"]
list_of_song_uri.append(song_uri)
list_of_duration = result["duration_ms"]
list_of_durations_ms.append(list_of_duration)
song_explicit = result["explicit"]
list_of_explicit.append(song_explicit)
this_album = result["album"]["name"]
list_of_albums.append(this_album)
song_popularity = result["popularity"]
list_of_popularity.append(song_popularity)
Then, I create a pandas DataFrame, name each column and populate it with the above lists, and export it as a CSV for a backup copy.
all_songs = pd.DataFrame(
{'artist': list_of_artist_names,
'artist_uri': list_of_artist_uri,
'song': list_of_song_names,
'song_uri': list_of_song_uri,
'duration_ms': list_of_durations_ms,
'explicit': list_of_explicit,
'album': list_of_albums,
'popularity': list_of_popularity
})
all_songs_saved = all_songs.to_csv('top50_songs.csv')
Using the DataFrame, I create two visualizations. The first is a count plot using seaborn to show how many top songs came from each artist represented in the top 50 tracks.
descending_order = top50['artist'].value_counts().sort_values(ascending=False).index
ax = sb.countplot(y = top50['artist'], order=descending_order)
sb.despine(fig=None, ax=None, top=True, right=True, left=False, trim=False)
sb.set(rc={'figure.figsize':(6,7.2)})
ax.set_ylabel('')
ax.set_xlabel('')
ax.set_title('Songs per Artist in Top 50', fontsize=16, fontweight='heavy')
sb.set(font_scale = 1.4)
ax.axes.get_xaxis().set_visible(False)
ax.set_frame_on(False)
y = top50['artist'].value_counts()
for i, v in enumerate(y):
ax.text(v + 0.2, i + .16, str(v), color='black', fontweight='light', fontsize=14)
plt.savefig('top50_songs_per_artist.jpg', bbox_inches="tight")
The second graph is a seaborn box plot to show the popularity of songs within individual artists represented.
popularity = top50['popularity']
artists = top50['artist']
plt.figure(figsize=(10,6))
ax = sb.boxplot(x=popularity, y=artists, data=top50)
plt.xlim(20,90)
plt.xlabel('Popularity (0-100)')
plt.ylabel('')
plt.title('Song Popularity by Artist', fontweight='bold', fontsize=18)
plt.savefig('top50_artist_popularity.jpg', bbox_inches="tight")
Further Considerations
For future interactions with the Spotify Web API, I would like to complete requests that pull top song data for each of the three term options and compare them. This would give a comprehensive view of listening habits and could lead to pulling further information from each artist.
The only part where you messed up is setting the secrets into the OS from Python… the spirit is for you to set them yourself, via Powershell or whatever, ultimately keeping secrets out of the code.
Because a lot of times secrets in code get distributed more widely than intended. i.e. accidental Github commit
Any idea how to get this data in js, like this https://api.spotify.com/v1/me/playlists,
e.g https://api.spotify.com/v1/top/2021 like this
I would reference the Spotify API documentation, but specifically one of the API wrappers for JS like spotify-web-api-js might be helpful.
This was incredibly useful and helpful! Also your music taste is great lol!
Thanks!