Spotify coverage(of my music collection)
I cobbled together a few scripts last night which:
- Create a list of all release MBIDs in my local music collection.
- For each MBID:
- Get the tracklist, album title and album artist from the musicbrainz API.
- Perform an album lookup using the spotify API with a query which consists of album artist name + album title.
- Get the tracklist from the spotify API for each matching album.
- Compare the musicbrainz tracklist with each of the spotify results and assign a confidence rating.
- Import the data into a SQL database
One obvious problem with this approach is that a text search will not always find a match, even if we know it exists. For example, spotify does not have El dueño del sistema. Though they do have an album by that name (El Dueño Del Sistema), but that is in MusicBrainz as El dueño del sistema: Special Edition. So, this should have been 1 match.
Data
mysql> SELECT count(mbid),confidence FROM results -> GROUP BY CASE -> WHEN confidence IS NULL then 0 -> WHEN confidence BETWEEN -1 AND -1 THEN 1 -> WHEN confidence BETWEEN 0 AND 50 THEN 2 -> WHEN confidence BETWEEN 50 AND 100 THEN 3 -> END; +-------------+------------+ | count(mbid) | confidence | +-------------+------------+ | 16 | NULL | | 1486 | -1 | | 173 | 0 | | 699 | 92 | +-------------+------------+ 4 rows in set (0.01 sec) mysql> SELECT count(mbid) FROM results; +-------------+ | count(mbid) | +-------------+ | 2374 | +-------------+ 1 row in set (0.01 sec)