Interesting Papers from ISMIR 2020

Most conferences are going virtual and we are slowly adapting to the new way of doing things. Among the conferences that I attended so far, ISMIR 2020 (International Society for Music Information Retrieval) had one of the best organization and felt quite close to the real-life experience. I made a conscious decision to only dedicate my time to the conference during its week, as I would if I traveled to Canada, and it had exhausted me as much as the real deal would. I cannot imagine what would happen if I was also trying to work during those days… Anyway, here are some papers that caught my attention and my thoughts on those. Enjoy!

Session 1

Explaining Perceived Emotion Predictions in Music: an Attentive Approach
Sanga Chaki, Pranjal Doshi, Sourangshu Bhattacharya, Prof. Priyadarshi Patnaik

The authors evaluate attention models on the 1000 songs for Emotional Analysis of Music dataset in terms of arousal-valence. They take a look at different features, and no surprise, spectral features work the best. And obviously, attention-based models perform better than LSTMs. One cool thing is that they took a deeper look into the attended points on arousal-valence graphs.
Measuring Disruption in Song Similarity Networks
Felipe V Falcão, Nazareno Andrade, Flavio Figueiredo, Diego Furtado Silva, Fabio Morais

The authors take the idea of a disruption index for graph nodes and apply it to the music similarity graphs. This paper takes a look at the history of Brazilian music tradition Forró and tries to identify the disruptive tracks. From my research perspective, I wonder if it can be used to identify disruptive recommendations, so-called taste-breakers, however, if it not completely the same notion.
Pandemics, Music, and Collective Sentiment: Evidence from the Outbreak of COVID-19
Meijun Liu, Eva Zangerle, Xiao Hu, Alessandro Melchiorre, Markus Schedl

An interesting paper that takes a look at the music listening habits of different countries before and after the COVID-19 outbreak. The results show that while women are less affected than men in terms of mood, everybody is.

Session 2

Can’t Trust the Feeling? How Open Data Reveals Unexpected Behavior of High-level Music Descriptors
Cynthia C. S. Liem, Chris Mostert

Authors take a look at the high-level descriptors in AcousticBrainz and find out that there are some issues once you take the trained models (even simple) outside of the test set. Factors that influence prediction even go to the version of ffmpeg library i.e. MP3 decoding. Looking at non-reliable genre classifiers: genre_tzanetakis predicts jazz a lot, and genre_dortmund predicts electronic. Better models (from distribution point of view) include mood_sad, genre_rosamerica.
Female Artist Representation in Music Streaming
Avriel C. Epps-Darling, Henriette Cramer, Romain Takeo Bouyer

A Spotify paper taking a look and showing some data that providing more evidence to the underrepresentation of female artists. Only 1 in 5 streams go to female artists.
The Freesound Loop Dataset and Annotation Tool
Antonio Ramires, Frederic Font, Dmitry Bogdanov, Jordan B. L. Smith, Yi-Hsuan Yang, Joann Ching, Bo-Yu Chen, Yueh-Kao Wu, Hsu Wei-Han, Xavier Serra

New and really cool dataset of loops from Freesound by my colleagues! Annotation still ongoing, you can contribute!
Should We Consider the Users in Contextual Music Auto-tagging Models?
Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, Gael Richard

Nice paper from my colleague from MIP-Frontiers. When you do auto-tagging, personalizing prediction on a per-user basis improves the accuracy. This paper implementation takes user embeddings into the auto-tagging pipeline.

Session 3

Tag2Risk: Harnessing Social Music Tags for Characterizing Depression Risk
Aayush Surana, Yash Goyal, Manish Shrivastava, Suvi H Saarikallio, Vinoo Alluri

The authors perform surveys to identify in-risk groups for depression. Then they compare the arousal-valence values of music tags mapped to typical categories in-between the groups. There is a strong correlation between sad music there.
Metric Learning vs Classification for Disentangled Music Representation Learning
Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam

The authors compare metric learning vs classification on various MIR tasks. As a result, they find that classification-based models are generally advantageous for training time, similarity retrieval, and auto-tagging, while deep metric learning exhibits better performance for triplet-prediction task.
User Insights on Diversity in Music Recommendation Lists
Kyle Robinson, Dan Brown, Markus Schedl

An exploratory study on music diversity. One interesting insight is the difference between the notion of diversity within genres vs overall diversity (give me something new). Things that had surfaced in the interviews: the mood is much more used as a seed comparing to context, and that cultural background makes a difference.

Session 4

Semantically Meaningful Attributes from Co-listen Embeddings for Playlist Exploration and Expansion
Ayush Patwari, Nicholas Kong, Jun Wang, Ullas Gargi, Michele Covell, Aren Jansen

Paper from Google Music and Google Research. They did a shallow network on top of the ResNet-18-like architecture to predict genres, regress valence, energy, etc. The approach seems to work well for playlist extension, better for playlists like “Classical for sleeping” than “Feeling good in the 80s”.
User Perceptions Underlying Social Music Behavior
Louis Spinelli, Josephine Lau, Jin Ha Lee

Authors explore social music listening (quite original). Some of the insights include that there had been 2 types of people emerging from their interviews: users with impression management and security concerns, but also confident music selectors; and very considerate CMS users with almost no impression management or security concerns.
Exploring Acoustic Similarity for Novel Music Recommendation
Derek S Cheng, Thorsten Joachims, Douglas Turnbull

The authors find out that indeed acoustic similarity seems to be highly correlated with playlist selection and recommendation, but not personal preference. Cool work, and it is done in the context of Localify.org.
Mood Classification Using Listening Data
Filip Korzeniowski, Oriol Nieto, Matthew McCallum, Minz Won, Sergio Oramas, Erik Schmidt

Pandora paper that compared MusiCNN embeddings vs the matrix factorization embeddings. In-house embeddings obviously are is still overpowering audio features. However, if the artist/album effect is taken care of, audio embeddings come closer to collaborative filtering. Also cool to see that musical models correlate stronger together than implicit and explicit feedback models.

Session 5

A Combination of Local Approaches for Hierarchical Music Genre Classification
Antonio R. Parmezan, Diego Furtado Silva, Gustavo Batista

Cool approach representing genre taxonomies as trees and using classifiers on a per-node basis.
Practical Evaluation of Repeated Recommendations in Personalized Music Discovery
Brian Manolovitz, Mitsunori Ogihara

Authors use the Spotify Discover Weekly playlist to see the effect of repeating the new recommendations. Results include that the people could stay quite indifferent after one playcount, but after two/three playcounts they make up their mind about the song, more liked than disliked songs.
“Play Music”: User Motivations and Expectations for Non-specific Voice Queries
Jennifer Thom, Angela Nazarian, Ruth Brillman, Henriette Cramer, Sarah Mennicken

A very cool explorative study from Spotify asking people what do they mean when they say “play music” to Google Home or Amazon Alexa, etc. Answers range from people being busy with chores to just play something familiar. Cool insight is that most don’t look for new music, they just want something familiar.
Detecting Collaboration Profiles in Success-based Music Genre Networks
Gabriel Oliveira, Mariana Santos, Danilo B Seufitelli, Anisio Lacerda, Mirella M Moro

This paper looks into artist collaborations in different genres of music and categorizing them by type, impact, etc.
Analysis of Song/Artist Latent Features and Its Application for Song Search
Kosetsu Tsukuda, Masataka Goto

The authors use latent space learned from matrix factorization to explore meanings of artist and song vectors being close or pointing in the same direction.

Session 6

Joyful for You and Tender for Us: the Influence of Individual Characteristics and Language on Emotion Labeling and Classification
Juan S. Gómez-Cañón, Estefania Cano, Perfecto Herrera, Emilia Gomez

Amazing work by my colleague in music emotion recognition (MER). They are building a dataset, and the thing is that there is almost no agreement in emotion annotation! And it is quite difficult even without trying to do it across different languages.
“Butter Lyrics Over Hominy Grit”: Comparing Audio and Psychology-based Text Features in MIR Tasks
Jaehun Kim, Andrew M. Demetriou, Sandy Manolios, M. Stella Tavella, Cynthia C. S. Liem

This paper uses psychology features from NLP to do some typical MIR tasks like autotagging or RecSys. The features work to a certain degree.
Multilingual Music Genre Embeddings for Effective Cross-lingual Music Item Annotation
Elena V. Epure, Guillaume Salha, Romain Hennequin

Deezer paper, that uses NLP and DBpedia to match genres between different languages. Very cool work
Less Is More: Faster and Better Music Version Identification with Embedding Distillation
Furkan Yesiler, Joan Serra, Emilia Gomez

Another paper by my colleagues - they did very good advance in the state-of-the-art of song version/cover identification with drastically reduced size of embeddings and even better performance!

Late-Breaking Demos

Deep Embeddings with Essentia Models
Pablo Alonso-Jiménez, Dmitry Bogdanov, Xavier Serra

Work done by my colleagues, extension of their ICASSP 2020 paper. I am already using it for my research, quite easy to get embeddings from audio with Essentia and MusiCNN - check it out.