13 March 2024

Language & Access. Machine Learning for Digital Collections

Discovery Project Webinar, March 13

This Towards a National Collection (TaNC) webinar will explore how machine learning technologies may enhance discoverability of digital collection content and contribute to a better understanding of biased language, bringing together researchers from two of our Discovery Projects, Our Heritage, Our Stories and Transforming Collections.

Tehmina Goskar, Decolonising Arts Institute, UAL: Learning the biased languages of benevolence, equity and the machine
Youcef Benkhedda, University of Manchester: Unlocking community-generated digital content: enhancing discoverability with advanced NLP and knowledge graphs

The webinar will begin with Tehmina Goskar, discussing her collaboration with the Creative Computing Institute to experiment with machine learning to better understand biased languages of benevolence and equity in organisational texts of public art collections, and how this might inform long-standing and systemic concerns about bias and discrimination. Youcef Benkhedda will talk about cutting-edge natural language processing (NLP) techniques and how they may be used to enhance the discoverability of community-generated digital content (CGDC) in digital collections. There will then be an open Q&A session.

Full abstracts

Tehmina Goskar: Learning the biased languages of benevolence, equity and the machine

In this webinar I will discuss a two-year research strand called ‘Patterns of patronage’, part of Transforming Collections, which interrogates the mechanisms, relationships and languages that shape public art collections. Partly autoethnographic and partly action research, I will share my experience of machine learning – curating a dataset using organisational texts like policies and annual reports, training it and testing it – and how this method might shed light on the biased languages of benevolence and equity museums and their funders have grown accustomed to using.

Youcef Benkhedda: Unlocking community-generated digital content: enhancing discoverability with advanced NLP and knowledge graphs

Community-generated digital content (CGDC) frequently lacks representation and accessibility. Employing advanced AI & NLP methods, we enrich CGDC metadata, elevating its prominence within the UK national collection. Utilising named entity recognition (NER) and linking entities to Wikidata, we assign authoritative identifiers. Employing relation extraction, we generate subject-predicate-object triples, aiming to delineate relationships within the text. Ultimately, our goal is to populate a searchable knowledge graph, facilitating efficient CGDC searches and seamless integration into the national collection.

Recording