# Make SwagLyrics Great Again

## CCExtractor Development

Email: [email protected]

University: Vanderbilt University (probably), Nashville, TN

Potential Mentor(s): Willem Van Iseghem, ...

## Introduction

### Abstract

The SwagLyrics suite of repositories is an ecosystem of backend applications, a couple of python libraries, a chrome web extension (soon!) and even a discord bot. Over the course of this summer, we aim to work individually on these components as well as on how they interact together. The ultimate idea is to create a noops infrastructure, that is, a system that maintains itself and does not require any essential input from an actual human being, thus ensuring that such a proposal can never be created again (lol). Overall, the goal is to lend a degree of finality to the project and working on essential non-essential areas that cannot be simply ignored, as they have been until now. In essence, this proposal can be considered similar to the CCExtractor Sample Platform proposal last year.

To give better insights, there is currently no formal logging setup, no complete test coverage for the backend, resolving issues isn't as automated as I'd like it to be and diagnosing every issue by opening up raw logs is not very productive. The current process of supporting more songs leaves something to be desired even though we have a database configured for this purpose.

Other than these, there are also a few features to be worked upon that require time, which will be elaborated upon in the later stages of this proposal.

### Context

SwagLyrics for Spotify is the primary project ("swaglyrics") that contains both, an interface for fetching lyrics given a song and artist, as well as the application to continually fetch lyrics for the currently playing song on Spotify.

SwSpotify is the cross-platform python library that fetches the current track info from Spotify. This is used in swaglyrics but also by other developers in their application. The new Chrome Extension aims to add support for the Spotify Web Player, thus extending SwSpotify to work anywhere Spotify works.

swaglyrics-backend is the backend Flask application hosted on https://api.swaglyrics.dev which provides endpoints that the main application communicates with to sometimes better support lyrics for tracks that cannot be directly resolved by the main application but also to create and manage issues on the GitHub issue tracker to identify potential improvements. The backend also takes care of its own deployments from GitHub as well as sending relevant notifications to the Discord server via webhooks.

"stripper" refers to the string resolved when given a song, artist pair, generally after stripping away punctuation and other modifications. For example, the stripper for River (feat. Ed Sheeran) by Eminem would be eminem-river which when appended to the genius.com url would give us the lyrics page for that song.

## Overview

### Major Tracks

• This is bifaceted, first we plan to use the default python logging to configure a proper logging implementation on the backend.
• Then we use the Discord server API to plug relevant info to the Discord server for better readability and response. This requires the creation of initial wrapper functions and separate webhooks per channel.
• Things such as logging whenever the backend is used to resolve tracks come to mind, since that would enable us to easily identify errors and false positives at a glance.
• Also sending notifications when a song is being checked for legitness is useful since it is trivial to eyeball if a song is instrumental or not from the title in most cases.
• Further, implementing logging early on would be useful throughout the later stages of the program, especially when rolling out changes.

#### Testing

• The main repo has 100% code coverage but while a test suite was configured for the backend, tests to improve coverage still need to be written using pytest which currently sits at a mere 50% with most of the endpoints untested.
• It seems smart to me to do this in tandem with logging, so this is both helpful when working on it in the future and also when writing the tests.
• Once testing has been improved, I plan to make the backend auto deploy only when the test suite passes and not just when there is a github push event.

#### Issue Handling x Automation

• Currently, the opened issues do not contain any backend logs which would actually aid in triaging the issue.
• Next, the backend logs can be used to close the issue if it is not a problem with swaglyrics by extending how trivial cases are handled since now the project is relatively stable.
• If it is something we can fix, then the current way is to send a POST request to the server. This can be abstracted away to the backend which is hooked up to GitHub. I imagine commenting something like !add stripper abc-xyz which automatically updates the backend database and closes the issue or just !close for issues that are not fixable by us which simply closes the issue.
• This requires the creation of functions not only to parse info from GitHub but then perform the relevant database operation and then report back.

#### Genius Search Mismatch

• When swaglyrics is unable to fetch lyrics for a song it sends that info to the backend which queries the Genius API's search endpoint. Now the search endpoint is really finicky (which is why we don't use it directly), which is why after we get the results back, we check for mismatch.
• The current criteria is simply to check if more than half the words in our title are also in the title we get from Genius. But data shows that this sometimes results in false positives and/or negatives if there are extra words such as remix or featuring artists.

current method

• I want to take a complete week out, use actual data and optimize both the searching parameters as well as this comparison algorithm.

#### API Docs

• Since we have a proper API, I feel it prudent to set up https://docs.swaglyrics.dev or something using sphinx to formalize documentation.
• Extend this for SwSpotify, since that is frequented upon by other developers.

• Currently the discord bot can do two things, if you call it by $swaglyrics or $sl then it will fetch lyrics for the currently playing track, or you can specify song and artist like $sl Hello Adele. • But this can get tedious if you're continually listening to music so the idea is to introduce a mode that keeps sending new lyrics whenever the next song plays. • In order to do this, I'll get familiar with discord.py which is used to communicate with Discord. Also, a way to stop the mode will be figured out so we don't have to be dependent upon the user to end it manually. • Of course, more tests will accompany this feature. #### Large Scale Stripper Test Suite • One of the more ambitious ideas of this proposal is to test stripper resolution for, say, a 1000 songs drawn from different genres and charts in order to truly figure out if we break support whenever a change to the stripper algorithm is made. • Currently, the tests cover 12 edge cases based on empirical data but now we want to make sure we don't break any case we might have been handling so far—and just didn't know it. • This probably requires manual cataloguing initially as well as a mechanism for updating it as need be. A format similar to csv can be used, except with a separator that would not appear in song titles, such as tab. #### Fix Possible DDoS flaw • When receiving a request from swaglyrics, the backend checks if the info is legit by querying the Spotify API and checking if the song and artist actually exist. • However, this means you can download the names of collections of songs on Spotify and use those to try a DDoS attack or something. While there is rate limiting employed currently, this is not a risk I'd like to continue taking. • The solution is to request the potential lyrics url once and if it returns a 200 then we take no further action since it is supposed to be the /unsupported endpoint. • Unfortunately, PythonAnywhere limits internet access unless you're querying API endpoints so this part also involves writing them a polite email requesting access. ### Minor Tracks These should also serve as buffer in case there is extra time in any of the phases. 1. Adding type hints 2. Adding support for Bollywood songs 3. Using instrumentalness while deciding whether to make issues 4. Adding request timeouts to all requests 5. Moving away from global variables 6. Figure out more trivial cases for whom issues should not be created • Add text to be shown to user that that song probably does not exist on Genius. 7. Adding a proper update text sort of thing whenever there is a new release that tells why you should update • Might be smart to show this just once or maybe only if minor version change or bigger 1. Checking for updates just once a day instead of every time the application is run • Also a parameter to force update if needed 1. Working on the flow and content of the Discord bot commands • The current prefix $ has a high possibility of being used by other bots too
2. Addressing a recent bug that has started appearing when lyrics aren't fetched the first time around but it works when you do it again.

3. (Optional) Renaming the stripper function to something more family friendly

4. Other stuff that will eventually pop up over the course of 3 months when delving into the code

## Timeline

### Community Bonding (May 4, 2020 - June 1, 2020)

• Clean up current issues and pull requests on the repositories.
• Publish the Chrome Extension on the Chrome Web Store
• Make new SwSpotify and swaglyrics releases
• Do community bonding stuff (??)

### Coding I (June 1, 2020 - June 29, 2020)

• Major Tracks: Logging x Notification, Testing, DDoS Flaw
• Minor Tracks: Type hints, Bollywood, Investigate the weird bug
• Milestones: Completed logger and discord interface with as close to 100% test coverage as possible. The DDoS flaw should be fixed at this stage as well.

I anticipate that adding the logger and then configuring notifications and writing unit tests can easily take more time than anticipated, especially with edge cases. Hence, I haven't added another Major Track for this phase other than the DDoS flaw since that is a priority.

### Coding II (July 3, 2020 - July 27, 2020)

• Major Tracks: Issue Handling x Automation, Genius Search Mismatch, Discord Bot
• Minor Tracks: Trivial Case Expansion, Request timeouts, Global variables, Discord Flow
• Milestones: Fully (or near fully) automated issue handling along with a better comparison algorithm and the new Discord Bot command.

This is pretty solid, since the current issue handling infrastructure is low-key convoluted and we'll be using the new logs from the previous phase in issue handling here. Also, the first minor track here is less major than major tracks but more major than other minor tracks.

### Coding 3 (August 1, 2020 - August 24, 2020)

• Major Tracks: API Docs, Large Scale Stripper Test Suite, Wrapping Up
• Minor Tracks: Update check, Instrumentalness, Update release notes functionality
• Milestones: https://docs.swaglyrics.dev and LSSTS

This phase deals with the creation of new stuff, hence I expect the major tracks to be slightly more time consuming as I would be starting from scratch. A buffer period here is equally important in case more time is needed to add finishing touches.

## Other Important Info

• A blog post at the end of each phase sounds reasonable to me.
• I don't have a fixed 9-5 work schedule but definitely can put equal hours in each day.
• Communication wouldn't be a problem even with the different time zones.
• I don't see any planned absences other than possibly some visa stuff for college.

## Why This Project

Yes true, it's not one of those projects that's going to do something novel and play with some cool new technology but I feel it's some legit work that will stick around for a long time. Fixing a 1000 small problems over the next 4-5 months in something that's being used by a lot of people everyday is quite valuable for me than to spend time on a project that wouldn't see any sunshine once the summer is over.

Since I personally use the project I can firsthand appreciate the value that will be added not only as a maintainer but also as a user. Which is why I humbly hope this proposal gets a slot :)