Cooking, maps and Swiss army knives: DiSSCo at TDWG2023

26 October 2023

Last October 9th saw some of the DiSSCo folk head south of the equator to Hobart, in Tasmania, for the TDWG 2023 Conference. Once again, TDWG’s annual rendez-vous provided the perfect stage for organisations in the field of biodiversity informatics to showcase their state-of-the-art developments, and -needless to say- DiSSCo was there. Our colleagues presented their latest efforts in a variety of topics, including the Digital Extended Specimen architecture, data mapping, the MIDS standard, DOIs and more.

Scroll down to get a general view of the sessions that DiSSCo hosted during TDWG2023 and check the abstracts out. Links to the recordings of each session will come soon. 

By DiSSCo CSO

 

About TDWG: Historically known as the Taxonomic Databases Working Group, today’s Biodiversity Information Standards (TDWG) is a not-for-profit, scientific and educational association formed to establish international collaboration among the creators, managers and users of biodiversity information and to promote the wider and more effective dissemination and sharing of knowledge about the world’s heritage of biological organisms. Find out more at TDWG’s website

A Simple Recipe for Cooking your AI-assisted Dish to Serve it in the International Digital Specimen Architecture

By Wouter Addink, Sam Leeflang, Sharif Islam

With the rise of Artificial Intelligence (AI), a large set of new tools and services is emerging that supports specimen data mapping, standards alignment, quality enhancement and enrichment of the data. These tools currently operate in isolation, targeted to individual collections, collection management systems and institutional datasets. To address this challenge, DiSSCo, the Distributed System of Scientific Collections, is developing a new infrastructure for digital specimens, transforming them into actionable information objects. This infrastructure incorporates a framework for annotation and curation that allows the objects to be enriched or enhanced by both experts and machines. This creates the unique possibility to plug-in AI-assisted services that can then leverage digital specimens through this infrastructure, which serves as a harmonised Findable, Accessible, Interoperable and Reusable (FAIR) abstraction layer on top of individual institutional systems or datasets. An early example of such services are the ones developed in the Specimen Data Refinery workflow (Hardisty et al. 2022). 

Access the full abstract here.

Photo: Wouter Addink at TDWG 2023

Harmonised Data is Actionable Data: DiSSCo’s solution to data mapping

By Sam Leeflang, Wouter Addink

Predictability is one of the core requirements for creating machine actionable data. The better predictable the data, the more generic the service acting on the data can be. The more generic the service, the easier we can exchange ideas, collaborate on initiatives and leverage machines to do the work. It is essential for implementing the FAIR principles (Findable, Accessible, Interoperable, Reproducible), as it provides the “I” for Interoperability (Jacobsen et al. 2020). The FAIR principles emphasise machine actionability because the amount of data generated is far too large for humans to handle. 

While Biodiversity Information Standards (TDWG) standards have massively improved the standardisation of biodiversity data, there is still room for improvement. Within the Distributed System of Scientific Collections (DiSSCo), we aim to harmonise all scientific data derived from European specimen collections, including geological specimens, into a single data specification. We call this data specification the open Digital Specimen (openDS). It is being built on top of existing and developing biodiversity information standards such as Darwin Core (DwC), Minimal Information Digital Specimen (MIDS), Latimer Core, Access to Biological Collection Data (ABCD) Schema, Extension for Geosciences (EFG) and also on the new Global Biodiversity Information Facility (GBIF) Unified Model. In openDS we leverage the existing standards within the TDWG community but combine these with stricter constraints and controlled vocabularies, with the aim to improve the FAIRness of the data. This will not only make the data easier to use, but will also increase its quality and machine actionability.

Access the full abstract here.

Mapping across Standards to Calculate the MIDS Level of Digitisation of Natural Science Collections

By Elspeth M. Haston, Mathias Dillen, Sam Leeflang, Wouter Addink, Claus Weiland, Dagmar Triebel, Eirik Rindal, Anke Penzlin, Rachel Walcott, Josh Humphries, Caitlin Chapman

The Minimum Information about a Digital Specimen (MIDS) standard is being developed within Biodiversity Information Standards (TDWG) to provide a framework for organisations, communities and infrastructures to define, measure, monitor and prioritise the digitisation of specimen data to achieve increased accessibility and scientific use. MIDS levels indicate different levels of completeness in digitisation and range from Level 0: not yet meeting minimal required information needs for scientific use to Level 3: fulfilling the requirements for Digital Extended Specimens (Hardisty et al. 2022) by inclusion of persistent identifiers (PIDs) that connect the specimen with derived and related data. MIDS Levels 0–2 are generic for all specimens. From MIDS Level 2 onwards we make a distinction between biological, geological and palaeontological specimens. While MIDS represents a minimum specification, defining and publishing more extensive sets of information elements (extensions) is readily feasible and explicitly recommended.

Access the full abstract here.

Photo: Summary of DiSSCo-related talks at TDWG 2023

A Novel Part in the Swiss Army Knife for Linking Biodiversity Data: The digital specimen identifier service

By Wouter Addink, Soulaine Theocharides, Sharif Islam

Digital specimens are new information objects on the internet, which act as digital surrogates of the physical objects they represent. They are designed to be extended with data derived from the specimen like genetic, morphological and chemical data, and with data that puts the specimen in context of its gathering event and the environment it was derived from. This requires linking the digital specimens and their related entities to information about agents, locations, publications, taxa and environmental information. To establish reliable links and (re-)connect data to specimens, a new framework is needed, which creates persistent identifiers (PIDs) for the digital specimen and its related entities. These PIDs should be actionable by machines but also can be used by humans for data citation and communication purposes.

Access the full abstract here.

Data Standards and Interoperability Challenges for Biodiversity Digital Twin: A novel and transformative approach to biodiversity research and application

By Sharif Islam, Hanna Koivula, Dag Endresen, Erik Kusch, Dmitry Schigel, Wouter Addink

The Biodiversity Digital Twin (BioDT) project (2022-2025) aims to create prototypes that integrate various data sets, models, and expert domain knowledge enabling prediction capabilities and decision-making support for critical issues in biodiversity dynamics. While digital twin concepts have been applied in industries for continuous monitoring of physical phenomena, their application in biodiversity and environmental sciences presents novel challenges (Bauer et al. 2021, de Koning et al. 2023). In addition, successfully developing digital twins for biodiversity requires addressing interoperability challenges in data standards.

BioDT is developing prototype digital twins based on use cases that span various data complexities, from point occurrence data to bioacoustics, covering nationwide forest states to specific communities and individual species. The project relies on FAIR principles (Findable, Accessible, Interoperable, and Reusable) and FAIR enabling resources like standards and vocabularies (Schultes et al. 2020) to enable the exchange, sharing, and reuse of biodiversity information, fostering collaboration among participating Research Infrastructures (DiSSCo, eLTER, GBIF, and LifeWatch) and data providers. It also involves creating a harmonised abstraction layer using Persistent Identifiers (PID) and FAIR Digital Object (FDO) records, alongside semantic mapping and crosswalk techniques to provide machine-actionable metadata (Schultes and Wittenburg 2019, Schwardmann 2020). Governance and engagement with Research Infrastructure stakeholders play crucial roles in this regard, with a focus on aligning technical and data standards discussions.

Access the full abstract here

Do you want to know more about the technical side of DiSSCo? DiSSCo puts different technical knowledge platforms at the scientific community’s disposal:

DiSSCoTech: Get the latest technical posts about the design of DiSSCo’s Infrastructure

DiSSCo Labs: A preview of experimental services and demonstrators by the DiSSCo community

DiSSCo GitHub: Code hosting for DiSSCo software, version control and collaboration

Categories

Archives