Most conferences are going virtual and we are slowly adapting to the new way of doing things. Among the conferences that I attended so far, ISMIR 2020 (International Society for Music Information Retrieval) had one of the best organization and felt quite close to the real-life experience. I made a conscious decision to only dedicate my time to the conference during its week, as I would if I traveled to Canada, and it had exhausted me as much as the real deal would. I cannot imagine what would happen if I was also trying to work during those days… Anyway, here are some papers that caught my attention and my thoughts on those. Enjoy!

Session 1

  • Explaining Perceived Emotion Predictions in Music: an Attentive Approach
    Sanga Chaki, Pranjal Doshi, Sourangshu Bhattacharya, Prof. Priyadarshi Patnaik

    The authors evaluate attention models on the 1000 songs for Emotional Analysis of Music dataset in terms of arousal-valence. They take a look at different features, and no surprise, spectral features work the best. And obviously, attention-based models perform better than LSTMs. One cool thing is that they took a deeper look into the attended points on arousal-valence graphs.

  • Measuring Disruption in Song Similarity Networks
    Felipe V Falcão, Nazareno Andrade, Flavio Figueiredo, Diego Furtado Silva, Fabio Morais

    The authors take the idea of a disruption index for graph nodes and apply it to the music similarity graphs. This paper takes a look at the history of Brazilian music tradition Forró and tries to identify the disruptive tracks. From my research perspective, I wonder if it can be used to identify disruptive recommendations, so-called taste-breakers, however, if it not completely the same notion.

  • Pandemics, Music, and Collective Sentiment: Evidence from the Outbreak of COVID-19
    Meijun Liu, Eva Zangerle, Xiao Hu, Alessandro Melchiorre, Markus Schedl

    An interesting paper that takes a look at the music listening habits of different countries before and after the COVID-19 outbreak. The results show that while women are less affected than men in terms of mood, everybody is.

Session 2

  • Can’t Trust the Feeling? How Open Data Reveals Unexpected Behavior of High-level Music Descriptors
    Cynthia C. S. Liem, Chris Mostert

    Authors take a look at the high-level descriptors in AcousticBrainz and find out that there are some issues once you take the trained models (even simple) outside of the test set. Factors that influence prediction even go to the version of ffmpeg library i.e. MP3 decoding. Looking at non-reliable genre classifiers: genre_tzanetakis predicts jazz a lot, and genre_dortmund predicts electronic. Better models (from distribution point of view) include mood_sad, genre_rosamerica.

  • Female Artist Representation in Music Streaming
    Avriel C. Epps-Darling, Henriette Cramer, Romain Takeo Bouyer

    A Spotify paper taking a look and showing some data that providing more evidence to the underrepresentation of female artists. Only 1 in 5 streams go to female artists.

  • The Freesound Loop Dataset and Annotation Tool
    Antonio Ramires, Frederic Font, Dmitry Bogdanov, Jordan B. L. Smith, Yi-Hsuan Yang, Joann Ching, Bo-Yu Chen, Yueh-Kao Wu, Hsu Wei-Han, Xavier Serra

    New and really cool dataset of loops from Freesound by my colleagues! Annotation still ongoing, you can contribute!

  • Should We Consider the Users in Contextual Music Auto-tagging Models?
    Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, Gael Richard

    Nice paper from my colleague from MIP-Frontiers. When you do auto-tagging, personalizing prediction on a per-user basis improves the accuracy. This paper implementation takes user embeddings into the auto-tagging pipeline.

Session 3

  • Tag2Risk: Harnessing Social Music Tags for Characterizing Depression Risk
    Aayush Surana, Yash Goyal, Manish Shrivastava, Suvi H Saarikallio, Vinoo Alluri

    The authors perform surveys to identify in-risk groups for depression. Then they compare the arousal-valence values of music tags mapped to typical categories in-between the groups. There is a strong correlation between sad music there.

  • Metric Learning vs Classification for Disentangled Music Representation Learning
    Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam

    The authors compare metric learning vs classification on various MIR tasks. As a result, they find that classification-based models are generally advantageous for training time, similarity retrieval, and auto-tagging, while deep metric learning exhibits better performance for triplet-prediction task.

  • User Insights on Diversity in Music Recommendation Lists
    Kyle Robinson, Dan Brown, Markus Schedl

    An exploratory study on music diversity. One interesting insight is the difference between the notion of diversity within genres vs overall diversity (give me something new). Things that had surfaced in the interviews: the mood is much more used as a seed comparing to context, and that cultural background makes a difference.

Session 4

  • Semantically Meaningful Attributes from Co-listen Embeddings for Playlist Exploration and Expansion
    Ayush Patwari, Nicholas Kong, Jun Wang, Ullas Gargi, Michele Covell, Aren Jansen

    Paper from Google Music and Google Research. They did a shallow network on top of the ResNet-18-like architecture to predict genres, regress valence, energy, etc. The approach seems to work well for playlist extension, better for playlists like “Classical for sleeping” than “Feeling good in the 80s”.

  • User Perceptions Underlying Social Music Behavior
    Louis Spinelli, Josephine Lau, Jin Ha Lee

    Authors explore social music listening (quite original). Some of the insights include that there had been 2 types of people emerging from their interviews: users with impression management and security concerns, but also confident music selectors; and very considerate CMS users with almost no impression management or security concerns.

  • Exploring Acoustic Similarity for Novel Music Recommendation
    Derek S Cheng, Thorsten Joachims, Douglas Turnbull

    The authors find out that indeed acoustic similarity seems to be highly correlated with playlist selection and recommendation, but not personal preference. Cool work, and it is done in the context of Localify.org.

  • Mood Classification Using Listening Data
    Filip Korzeniowski, Oriol Nieto, Matthew McCallum, Minz Won, Sergio Oramas, Erik Schmidt

    Pandora paper that compared MusiCNN embeddings vs the matrix factorization embeddings. In-house embeddings obviously are is still overpowering audio features. However, if the artist/album effect is taken care of, audio embeddings come closer to collaborative filtering. Also cool to see that musical models correlate stronger together than implicit and explicit feedback models.

Session 5

Session 6

Late-Breaking Demos

  • Deep Embeddings with Essentia Models
    Pablo Alonso-Jiménez, Dmitry Bogdanov, Xavier Serra

    Work done by my colleagues, extension of their ICASSP 2020 paper. I am already using it for my research, quite easy to get embeddings from audio with Essentia and MusiCNN - check it out.