Every week, Billboard releases the Top 100 chart, and this chart is one of the most definitive ways to measure the success of a song. We have used Machine Learning classification algorithms to label songs as hits or not, using audio feature data obtained from Spotify Web API. The dataset has 73,000 songs with 8,000 positive samples, so every iteration of an algorithm involved randomly sampling an equal number of negative samples to that of positive ones, to prevent skew. We have used feature importance and selection in order to give producers and artists key insights on the most important features that lead to featuring on the chart. Producers and labels can concentrate their capital on the marketing and publicity of the songs that are predicted as hits.
The Hit Song Science (HSS) problem aims at using Machine Learning and Predictive Analysis, to help artists and producers predict a song’s success. This problem was approached using Spotify Web API and the Billboard 100 charts’ historical data to predict whether a song is a hit or not. A dataset of 73,000 songs was constructed from the above-mentioned sources (since 2010 only). Five classification algorithms were tested with subtle modifications and best-performing algorithms were SVM-rbf (77.69%) and Random Forest (86.68%)