I've recently gotten into following the tracks I'm listening to using Last.fm . While it provides an impressive amount of featured and non-featured ways of tracking the music you've listened to, one giant gap I've discovered in my everyday listening experience is that it's not easy to import all of the songs played in a DJ set into your Last.fm account, so I've decided to build one.
How it works
There's this awesome website called 1001tracklists , in which a community tries its best to track down every single track played in a DJ set. It has become a de-facto standard way of sharing a tracklist, and plenty of DJs just link to it instead of bothering to write down all of the tracks they've played in a set.
This approach is pretty versatile, since it doesn't matter if the DJ uploaded its set on YouTube, Mixcloud, Soundcloud, or livestreamed it via Twitch. In all of those cases, odds are big that you're going to find a tracklist for it on 1001tracklists.
So, the process of getting a set into Last.fm looks like this:
- Listen to a recording of a set by your favourite DJ.
- Find the list of tracks played in a set using 1001tracklists.
- Run the script with the 1001tracklists URL.
- Scrape artist names and track names from the site.
- Use Last.fm's API to mark the tracks as played (scrobbled) in your Last.fm account.
Of course, this isn't achievable in less than 30 lines of Python without using some libraries that will make our code easier to write, so let's see what could help us out.
Useful Python libraries
- pylast allows us to interact with Last.fm's API in an intuitive way.
- requests allows us to easily make web requests.
- BeautifulSoup allows us to parse HTML code we've downloaded and extract necessary information from it.
- fake-useragent allows us to pretend that we're visiting the URL from a browser so that 1001tracklists doesn't block our requests.
The latter three are pretty commonly used when you want to scrape websites with Python. While there are alternatives, getting yourself familiar with these will provide you with an ability to scrape pretty much anything you want, as well as understand somebody else's scrapers. If you'd like to learn them, I'd recommend Chapter 12 from the book Automate the Boring Stuff with Python . You can read it for free on the book's website .
You can install all of these prerequisites with pip install pylast requests fake-useragent beautifulsoup4
.
Actual script
The code looks pretty clean compared to other scrapers I've written and used. I'm also posting it on my GitLab as a snippet .
#!/usr/bin/python3
import pylast, requests, sys, time
from fake_useragent import UserAgent
from bs4 import BeautifulSoup
LASTFM_API_KEY = "<your Last.fm API key goes here>"
LASTFM_API_SECRET = "<your Last.fm API secret goes here>"
LASTFM_USERNAME = "<your Last.fm username goes here>"
LASTFM_PASSWORD = pylast.md5("<your Last.fm password goes here>")
url = str(sys.argv[1])
print("Scrobbling from " + url )
network = pylast.LastFMNetwork(api_key=LASTFM_API_KEY, api_secret=LASTFM_API_SECRET, username=LASTFM_USERNAME, password_hash=LASTFM_PASSWORD)
r = requests.get(url, headers={'User-Agent': UserAgent().firefox })
soup = BeautifulSoup(r.text, features="lxml")
for track in soup.find_all('span', class_="trackFormat"):
full_text = track.text.split(" - ")
artist = full_text[0][1:]
track_name = full_text[1][:-1]
if artist != "ID" and track_name != "ID" and "ID Remix" not in track_name:
last_api_call = network.scrobble(artist, track_name, int(time.time()))
print ("Scrobbled: " + artist + " - " + track_name)
You'd need to generate a new API key on Last.fm from here , fill in the four variables in the script with your account info and your API credentials. After doing so, you just run the script like this:
python scrape.py https://www.1001tracklists.com/tracklist/td181gt/rl-grime-jawns-rossy-sable-valley-stream001-united-states-2020-05-30.html
Within these lines, we got rid of some quirks while scraping the output from 1001tracklists:
- Since the HTML output from 1001tracklists is somewhat messy, the first character of artist's name starts with a blank space, and the track's name ends with a blank space, so we're stripping away those characters.
- Tracks that weren't identified on 1001tracklists are referenced as "ID", so we're ignoring the following results:
- Songs with an unknown artist (artist set to "ID").
- Songs with an unknown title (title of the track identified as "ID").
- Unidentified remixes of a song (title of the track contains "ID Remix").
After running it, the history in your Last.fm account will look something like this:
The actual HTML code of an identified track on 1001tracklists looks something like this once you prettify it:
<span class="trackFormat">
<span class="notranslate blueTxt" translate="no">Arkist
<span title="open artist page" class="tgHid spL">
<a href="/artist/4yk4x5w/arkist/index.html" class="">
<i class="fa fa-external-link fa-lg linkIcon"></i>
</a>
</span>
</span>
<span class="" translate="no" class="notranslate"> - </span>
<span class="blueTxt">
<span translate="no" class="notranslate">Rendezvous</span>
<span title="open track page" class="tgHid spL">
<a href="/track/ls0v7w5/arkist-rendezvous/index.html" title="open track page" translate="no" class="notranslate ">
<i class="fa fa-external-link fa-lg linkIcon"></i>
</a>
</span>
</span>
</span>
At first I've spent about half an hour trying to select artist's name and track name individually in all of that mess, but then I figured out that I shouldn't overthink it. I only need to scrape the text found inside the trackFormat
class to get both artist's and track's name.
I hope that this article will serve as an inspiration of how to connect multiple services together without the need to rely on a service like IFTTT, Zapier or Integromat, as well as what to do once the website you want to connect isn't available in one of these services.