Hi There! I’m Alina!

Spotify Song Recommendation Algorithms (Using kNN and k-Means Predictive Models)

For this project, I took on the role of team leader, with my main responsibilities centered on the model configuration and the final code synthesis. That said, I contributed to many other parts of this project, from planning and coordination to evaluation and repository formatting. This memo outlines my specific contributions and some of my reflections on the project process as a whole

The project began with me initiating contact with the group and setting up our Teams group chat. I also got the ball rolling on our project topic discussion, where we ultimately decided to take a shot at an idea I proposed: creating a music recommendation algorithm. I will say, my original idea was actually a bit closer to a genre classification model using the GZTAN dataset and more scientific audio metrics, but we ended up changing course once Daniel learned that we could use the Spotify API through R. From there, Daniel and I began searching for datasets (a task I will say took significantly more time than I care to admit, as we jumped from multiple datasets in rapid succession). Once we decided to use Spotify Echo Nest variables as our project foundation, I helped shape the project proposal and outline a general program flow to help guide our development efforts.

The beginning of our project was a little slow going — Daniel wrote our first kNN model, however with that base model developed, I took over model configuration and wrote Model 2 and 3 as well as wrote all the calculation scripts for optimal coefficients for the K-Means and PCA (Yay Calculus! I was so excited to realize I needed to calculate a second derivative to prove our optimal K for the k-means, it made the Calc 1 class I took this semester feel worth it haha).

Then we hit our first roadblock — Evaluation. We quickly realized we had no way to verify our models, and after our meeting with Dr. Dinh, I actually had the idea to do a manual playlist annotation evaluation (You can ask Daniel, I called him in a frenzy while driving down the highway). I based our evaluation on the Mangione annotations Daniel and I were already doing with Dr. Hagen. Once we had our 3 models working, I wrote some batch query functions to speed up the playlist generation, and asked everyone to send me their top 10 songs in our dataset, and put together the evaluation files for everyone.

Evaluation was fun, but tedious. Because, unfortunately, you can’t skim music, and we did need to listen to all of that. But it was fun. I know personally, I found some new music with our model outputs, so I really do consider this project a win. Once evaluations were done, model 2 was confirmed. I sent our summary statistics off to Alex to make some visualizations for our annotation report, and began compiling all of our scripts for our GitHub Repository. (I recently learned in Dr. Friedman’s class how to publish R directories to GitHub and wanted to give all of our group members a way to report this on their resume, and I quickly realized how many scripts we collected throughout this project and wanted a more organized way to present them — I know we had to submit a full HTML/PDF but I felt our master document was very messy and a lot of code to parse through, so, GitHub).

Finally, to wrap this project up, I thought that it would be fun to have a fun presentation and found a PowerPoint template with which to present our findings. It was fun trying to adapt the Spotify interface to present research findings, but I think we made some clever decisions (My favorite by far was the variable report that looked like a playlist, I was particularly proud of that one)

All in all, I am very proud of this project and grateful to the team for humoring me and letting us explore a personal passion project of mine for our final! I hope they all enjoyed it as much as I genuinely did πŸ™‚ 

Our GitHub Repository with all of our project info Model reports, and Annotation Evaluations can be found: Here

Our Shiny App where you can interact with our Final Recommendation Algorithm can be found: Here

Leave a comment

About the author

Alina Hagen an aspiring data scientist and digital artist located in Tampa, FL, with a passion for new and emerging technologies. Her background consists of a unique blend of analytical and creative skills that inform and fuel her love for data coding, analysis, and visualization. While her academic track has been anything but linear, it has instilled in her a deep-seated curiosity for how people interact with information, whether through labels in an art museum, dashboards in a business meeting, or creative projects that inspire people for years to come.